📊
Rau Lab Handbook
  • Welcome to the Rau Lab!
  • About the Lab
  • Lab Policies
    • Code of Conduct
    • Working Hours and Time Off
    • Lab Safety
    • Scientific Integrity
    • Record Keeping / Notebooks
    • Mentorship and 1-on-1s
      • 1-on-1 Meeting Agenda Template
    • Communication
    • Scientific Writing
    • COVID 19 (largely depreciated)
  • Expectations and Responsibilities
    • For Everyone
    • For the PI
    • For Post-Docs
    • For Graduate Students
    • For Undergraduates
    • For Lab Technicians
    • For Rotation Students
  • Resources
    • Reading List
    • Software
    • Shared Computers
    • Ordering
    • Useful Scientific Links
    • Other Things You May Need
Powered by GitBook
On this page
  • R
  • Longleaf (UNC Cluster Computing)

Was this helpful?

  1. Resources

Software

PreviousReading ListNextShared Computers

Last updated 3 years ago

Was this helpful?

R

Most computing in the lab is done in R, which requires a few downloads to use effectively

Why R?

R is a widely used, powerful programming language with a large community of bioinformatics researchers regularly putting out new, open source packages for analyses. R is particularly powerful when it comes to statistics and graphics.

Setting up R involves three steps:

  1. Download R from You'll want to download it from the nearest mirror, which is

  2. Download . While R comes with its own IDE (Integrated Development Environment), the version put out by the people at Rstudio is better in basically every way.

  3. Learn how to use R. , but my best advice for learning is to start with a problem you want to solve. The examples found in a tutorial need to be reapplied to your own problem before you can really start to absorb them. For example, I learned R when the postdoc I was working with handed me some code and told me to generalize the functions to work on any dataset. Talk to me if you need help brainstorming.

Longleaf (UNC Cluster Computing)

(This is largely borrowed from from his own lab manual)

We have access to the Longleaf cluster at UNC for computing. While you may often find yourself able to do your work locally on your laptop or on a lab machine, you will sometimes need either the computing power or the security of the Longleaf cluster to do your work.

The main pieces you need to work on longleaf include:

  • Use (with the VPN) for interactive work w/ RStudio or when you expect visual outputs.

  • Some way of editing and running code on the cluster (Rstudio for R, Notepad++/EMACS/vim for most everything else)

  • Know how to submit jobs to the queue

  • Think about version control.

On Demand for Interactive Work

As of 2020, Research Computing at UNC has made a very nice solution for interactive work on the cluster, which makes the piece below about X11 forwarding and ESS irrelevant. For various data science applications, first see if they are supported here, as this will be a much easier interface for most students. If you are off campus you will need to connect via VPN first.

Submitting Jobs

WIP

Version control using git

For editing data analysis R scripts or working on a new method, you should be saving your code in git repositories, and typically also syncing this with a BitBucket or GitHub remote server.

In the end, the ideal setup is to have GitHub repos on your laptop and the same repo on the cluster, and you will use git pull to keep all code up to date on all locations. You should commit and push your code daily, to avoid any lost work.

Other

You will have to set up SSH keys on the cluster, to sync git repositories on the cluster with GitHub or BitBucket. You can follow the steps described on the .

its official website.
at Duke
Rstudio
There
are
a
lot
of
different
R
tutorials
Mike Love's instructions
OnDemand
the git page
Inkscape Plugin for resizing figures cleanly
https://ondemand.rc.unc.eduondemand.rc.unc.edu