Working with R
From time to time you may be required to work with R script as most of the bioinformatics/computational biology packages are written in R (need citation). In your local computer, working with R may appear fairly easy as you can always install RStudio Desktop and execute your R script there. But what if you want to run a script that requires high memory and your computer doesn't have enough RAM? Yes you can always rely on Maxwell! It's pretty straightforward to work with R on HPC as you can just create a ready-to-run R script and run it using the Rscript
command.
Rscript myscript.R
With VS Code you can do more than just running the script, you can also enter an interactive mode that will ease development and analysis (basically like using RStudio but on Maxwell and with powerful tooling that VS Code extension offers). So here's how we can do it:
-
Open your terminal (
ctrl
+ `) and create a new conda/mamba environment and activate it[r04mr23@maxlogin1(maxwell) ~]$ mamba create -n myrenv
[r04mr23@maxlogin1(maxwell) ~]$ mamba activate myrenv
(myrenv) [r04mr23@maxlogin1(maxwell) ~]$ -
Install the R and R Debugger extensions for VS Code
-
In order for VS Code R to work you have to install the following packages Mandatory:
R
: The R programming language itself- r-httpgd: Required for VS Code interactive plot viewer to work
- radian: A better R terminal
- r-languageserver: Required for VS Code to provide code completion, dignostics, formatting and any more features
- r-jsonlite: Relatively fast jsonparser for statistical data and the web
- r-irkernel: R Kernel for Jupyter Not mandatory but will be used for this guide:
- r-ggplot2: Data visualisation
- r-dplyr: Data manipulation
(myrenv) [r04mr23@maxlogin1(maxwell) ~]$ mamba install R radian r-httpgd r-lang r-jsonlite r-languageserver r-irkernel r-ggplot2 r-dplyr
-
Open the
settings.json
by pressingcommand
+,
or Code -> Settings... -> Setting and click the icon pointed below -
Update
settings.json
file by adding the following parameters"r.rterm.linux": "/uoa/home/r04mr23/sharedscratch/.conda/envs/myrenv/bin/radian",
"r.alwaysUseActiveTerminal": true,
"r.sessionWatcher": true,
"r.rpath.linux": "/uoa/scratch/users/r04mr23/.conda/envs/myrenv/bin/R"Replace
r.term.linux
andr.rpath.linux
with the path where R and radian installed in your environment. If you're not sure, in the environment that you activated check using the following(myrenv) [r04mr23@maxlogin1(maxwell) ~]$ which radian | xargs readlink -f
/uoa/scratch/users/r04mr23/.conda/envs/myrenv/bin/radian
(myrenv) [r04mr23@maxlogin1(maxwell) ~]$ which R | xargs readlink -f
/uoa/scratch/users/r04mr23/.conda/envs/myrenv/bin/RCopy the path, update your
settings.json
, save it (ctrl
/command
+s
) and close it -
Turn on the session watcher this will allow the communication between VS Code R and R Live session by performing the following commands (Read more on here)
- Edit the
.Rprofile
on your home directory(myrenv) [r04mr23@maxlogin1(maxwell) ~]$ vim ~/.Rprofile
- Append the following code to it
if (interactive() && Sys.getenv("RSTUDIO") == "") {
source(file.path(Sys.getenv(if (.Platform$OS.type == "windows") "USERPROFILE" else "HOME"), ".vscode-R", "init.R"))
} - Reload your terminal
- Edit the
-
In the directory of your choice create a new R script
test.R
and copy the code belowtest.Rprint("Test Maxwell R script")
x <- "Tester" -
Launch the terminal if you haven't (
ctrl
+ `) and run radian(myrenv) [r04mr23@maxlogin1(maxwell) ~]$ radian
R version 4.3.3 (2024-02-29) -- "Angel Food Cake"
Platform: x86_64-conda-linux-gnu (64-bit)
r$> -
Attach the terminal to vscode-R to the current session (this will allow you to keep track of variables that you created during the R-session etc.)
r$> .vsc.attach()
If successful it will change from the
R: (not attached)
toR: whatever version you're using
-
Now you can see the R: workspace (namespaces, variables etc) on the EXPLORER tab You can customise the layout by dragging the workspace around
-
VSCode R allows you to preview the dataset that you loaded, let's say I'm loading a data frame variable called
midwest
from this ggplot tutorialmidwest <- read.csv("http://goo.gl/G1K41K")
After you run it, the
midwest
variable will appear in the Workspace, clicking the magnifier icon will open the data.frame variable for us -
With VSCode R we can also preview plots, for example the following will generate a scatter plot using
ggplot2
library(ggplot2)
ggplot(midwest, aes(x=area, y=poptotal)) + geom_point()
That's how we can use Maxwell and VSCode to create an interactive R. However we still have a problem, we're running this R session in a login node!
r$> Sys.info()["nodename"]
nodename
"maxlogin1.int.maxwell.abdn.ac.uk"
Which isn't recommended as it has limited memory and may affect the login node performance. In order to run R in the other compute nodes we can create an interactive session and then run radian
/R terminal in it.
Happy hacking!