Introduction to R/RStudio on Seadragon
Published:
Overview
Seadragon is MD Anderson’s Research High-Performance Computing (HPC) cluster. It’s a great option if you want to: (1) speed up your scripts by running them as array jobs (parallel processing), (2) access more memory than what’s available on your local machine, or (3) free up your computer for other tasks (like running two scripts at the same time).
While the HPC team has already provided many detailed and wonderful documents to help users navigate the cluster, I want to share my initial experiences with Seadragon and offer a beginner-friendly guide to help others get started.
Seadragon supports a wide range of tools and programming languages. In this article, I’ll focus on R and RStudio, which are widely used in my research field. I’ll walk you through how to connect to Seadragon, launch RStudio on Seadragon, install R packages (a bit different from the usual method).
How to start?
Seadragon Official Documentation is your go-to resource, where you can find answers to most of your questions. To access the documents mentioned, you may need to use an MD Anderson laptop or connect to a VPN.
1. Request a Seadragon Account
- Request a Account: https://hpcweb.mdanderson.edu/request_account.html
Once you complete the training and gain access, the HPC team will send you a confirmation email.(I received mine in just a few days)
Step 2: Connect with Seadragon using SSH
SSH allows your computer to remotely connect to Seadragon through the command line. You can find the information in this SSH terminal connections section.
Step 2a:
For Mac: Open Terminal.(It is in Application.)
For Windows: Download PuTTY
Step 2b:
For Mac:
At terminal, type: ssh seadragon
enter your password.
For Windows:
Connect with Seadragon using PuTTY. Please follow this document.
You will get a welcoming message when you connect to cluster successfully.
Step 3: Using RStudio on Seadragon (I used R 4.3.1 version as an example)
Reference : https://hpcweb.mdanderson.edu/applications.dir/r.dir/RStudio_Container.pdf
Seadragon has different ways to launch RStudio for different users.
(1) Mac user:
If you use a MD Anderson MacBook, follow the Macintosh user section. You only need to do Step 1 and 2 for the first time, after that you can start from Step 3 (open XQuartz).
Step 1. copy a RStudio lsf script to your folder, this file will tell Seadragon how to launch Rstudio:
cp /risapps/singularity/repo/RStudio/4.3.1/rserver_4.3.1.lsf $Home/
cp: means copy
/risapps/singularity/repo/RStudio/4.3.1/rserver_4.3.1.lsf: the file we want
$Home/: where you want to save the file. You can check the folder path by typing pwd in your terminal. It should be something looks like /rsrchX/home/department/username.
Step 2. check download successfully, type
ls in your Mac terminal, it will show the files and folder you have under this path.
Step 3. download and open XQuartz
Step 4. submit script in XQuartz terminal, type this line:
bsub < rserver_4.3.1.lsf
bsub is used to submit a script. Remember to type < after bsub.
Step 5. type this line to get job information:
bpeek
You will get the following information (it will change everytime you use):
ssh -N -L 8787:cdragon061.cm.cluster:52424 ychang11@seadragon and point your web browser to http://localhost:8787
log in to RStudio Server using the following credentials:
user: youraccountname
password: XXXXXXXXXXX
Step 6. Open a new Mac terminal (not XQuartz), and type the above information:
ssh -N -L 8787:cdragon061.cm.cluster:52424
Step 7. type your password
Step 8. Open a new web browser window and type localhost:8787 in address line. Type your username and password in bpeek section.
Step 9. You should ready to use RStudio in your local web browser.
(2) Window user:
If you use a MD Anderson Windows PC, follow the Windows PC user section. You only need to do Step 3 for the first time (It is Step 1 and Step 2 in Mac user section) , after that you can keep it.
Step 1. open PuTTY,
For how to launch PuTTY (like terminal in Mac), please follow this document.
Step 2. type your username and password to log in.(make sure that X11 forwarding has been enabled, see the document for detail)
Step 3. do the Step 1 and Step 2 in the above Mac user section.
Step 4. Open (or download) X-Win32, please follow this document.
Step 5. submit script in X-Win32 terminal (not PuTTY), type this line:
bsub < rserver_4.3.1.lsf
Step 6. type this line to get job information:
bpeek jobid
You can get your jobid after running step 5.
You will get the similar information in Mac user Step 5.
ssh -N -L 8787:cdragon061.cm.cluster:52424 ychang11@seadragon and point your web browser to http://localhost:8787
log in to RStudio Server using the following credentials:
user: youraccountname
password: XXXXXXXXXXX
Step 7. type this line:
ssh -X cdragonXXX XXX* is the number from step 5. For example, it should be cdragon061 here.
Step 8. launch google chrome by typing this line:
google-chrome &
Step 9. enter this at the address bar:
XXXXXX is from step 5. For example, tt should be 52424 here (it will change every time.)
Step 10. Type your username and password in bpeek section. You should be able to use RStudio on chrome.
(3) VX Remote:
If you use your own computer, follow the VX Remote access section.
Step 1. Open a web browser type https://vxremote.mdanderson.org/, and log in (see the document for detail)
Step 2. click Windows icon and find X-Win32.
Step 3. For connecting to Seadragon on X-Win32, please follow this document.
Step 4. do the Step 1 and Step 2 in the above Mac user section.
After Step 4, do from step 5 to step 10 in the above Window user section.
How to install R packages
Since the RStudio container has the Ubuntu OS and Seadragon sue RedHat8 OS, we can’t install R packages through RStudio directly.
Here is the solution:
Before we start to install, we need to create a directory for your personal R library. Type this:
mkdir -p /home/username/R/ubuntu/4.3.1
Step 1. Log in Seadragon (Mac: termianl/Window : PuTTY/VM: X-win32)
Step 2. type this line to load R module
module load R/4.3.1
Step 3. Type R. You will launch R (not RStudio) in terminal.
Step 4. You can use install.packages() in terminal now.
Step 5. After you install them from terminal, you can library them in RStudio.
Some useful tools to manage files
Transferring and managing files between your local machine and the cluster through the terminal can be a bit tricky. Here are some GUI tools that can make it easier to manage your files on both Seadragon and your computer:
For Mac: Forklift For PC: FileZilla
Some Linux command we used in this article
ls : is used to list files and directories in the current working directory.
pwd : print the current working directory on your terminal.
cp : copy
cd : enter other directories. For example, cd home/username/myproject
mkdir : make a directories
Conclusion
Here’s how I launch RStudio. I hope this helps beginners get started with R/RStudio on Seadragon. If you notice any mistakes or have questions, feel free to let me know! Next time, I’ll cover how to submit an array job.