Skip to main content

Singularity on Mac, Reproducible Research and Lesson Learned

Remember the time when you want to install multiple software to reproduce someone else's research? What does it remind you of? Pain? Exhaustion from hours of Googling and troubleshooting? Only to face the famous "version compatibility" issues even after you're sure everything right according to the tutorial. Ah, the joys of science and its ever-looming "reproducibility crisis" [[schooler_2014]]!

Luckily we have containerisation technology these days! As written on the IBM website "Containerisation is the packaging of software code with just the operating system (OS) libraries and dependencies required to run the code to create a single lightweight executable—called a container—that runs consistently on any infrastructure.".

Started as a way to make the deployment process easier, containerisation has emerged as a valuable tool for ensuring reproducible research, so instead of letting the user install the software themselves and make them find the scripts (stored don't know where), giving them this so called 'container' streamline this process.

Docker and Singularity are among the most popular containerisation platform (Well technically Docker is more popular, haven't heard of Singularity till my PI mentioned it). However since I'm largely working with HPC environments, Singularity is preferred as it's designed for such purpose and more importantly it's open-source (Long live open-source!).

Singularity is built for Linux, so you have another problem if you're using Windows or Mac, in which you have to use virtual machine (VM) to run it. In this post, I'm going to share my experience dealing with Singularity on my Mac, from installing to building a singularity image file (.sif, end-product, the thing that you share to other people for reproducible research) and the lessons that I learned. So let's dive in.

As I said, since Singularity is primarily created for Linux, Mac users need to use virtual machines. Several options exist to create virtual machines on Mac, including Vagrant, Lima OrbStack (which unfortunately is not very open-source). In my case I'm using Lima as I found it to be straightforward and aligned with the Singularity installation guide.

Download Homebrew if you haven't had it installed

$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
$ (echo; echo 'eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)"') >> $HOME/.profile
$ eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)"

Install Lima

$ brew install lima

Running lima is pretty straightforward and it comes with various distros that you can use. In this case we're going to use the default singularity-ce.yml template that is provided in the docs.

# SingularityCE on Alma Linux 9
#
# Usage:
#
# $ limactl start ./singularity-ce.yml
# $ limactl shell singularity-ce singularity run library://alpine

images:
- location: "https://repo.almalinux.org/almalinux/9/cloud/x86_64/images/AlmaLinux-9-GenericCloud-latest.x86_64.qcow2"
arch: "x86_64"
- location: "https://repo.almalinux.org/almalinux/9/cloud/aarch64/images/AlmaLinux-9-GenericCloud-latest.aarch64.qcow2"
arch: "aarch64"

mounts:
- location: "~"
- location: "/tmp/lima"

containerd:
system: false
user: false

provision:
- mode: system
script: |
#!/bin/bash
set -eux -o pipefail
dnf -y install --enablerepo=epel singularity-ce squashfs-tools-ng

probes:
- script: |
#!/bin/bash
set -eux -o pipefail
if ! timeout 30s bash -c "until command -v singularity >/dev/null 2>&1; do sleep 3; done"; then
echo >&2 "singularity is not installed yet"
exit 1
fi
hint: See "/var/log/cloud-init-output.log" in the guest

message: |
To run `singularity` inside your lima VM:
$ limactl shell {{.Name}} singularity run library://alpine

TLDR; It builds an AlmaLinux and Singularity installed with it

You can update the template above to include more options, for example since the default will only spare 4gb of memory adding memory: "8GiB" will spare 8 instead of 4, adding writable options below the mount points will let you write files in those locations (do it at your own risk, also mind that if writable not being set to true the files in the mounted directory become read-only)

...

memory: "8GiB"

mounts:
- location: "~"
writable: true
- location: "/tmp/lima"
writable: true

...

running the command below will start your lima vm (Select Proceed with the current configuration option when prompted)

$ limactl start ./singularity-ce.yml

When you run the command below, you will see that your singularity-ce instance is already running

$ limactl list

To stop and remove/delete the instance you can use the following command respectively

$ limactl stop singularity-ce
$ limactl remove singularity-ce
$ limactl delete singularity-ce

Now we got our singularity-ce running, next step is to create the singularity container, let's say I want to create container that contains softwares from different programming language. The command below will let you enter the VM interactively

$ limactl shell singularity-ce
[user@lima-singularity-ce]$

You can also directly run singularity, for example the command below will run the Alpine Linux image from the Sylabs cloud library

$ limactl shell singularity-ce singularity run library://alpine

Since we will be creating the Singularity container image ourselves, we won't be downloading it from the Sylabs cloud library but instead we will write something called .def file. .def file or definition file is a recipe to build container image with Singularity, it's just a Singularity headers and bunch of shell commands that will be executed to create the image container. This guides from Singularity docs is very helpful in explaining what we can put in the .def file.

Below is a minimal script to create a container with pandas and seaborn installed

test.def

Bootstrap: docker
From: ubuntu:22.04

%files
/Users/hariesramdhani/Documents/requirements.txt /requirements.txt

%post
export DEBIAN_FRONTEND=noninteractive

apt-get clean all && \
apt-get update && \
apt-get upgrade -y && \
apt-get install -y \
autoconf \
autogen \
build-essential \
curl \
libbz2-dev \
libcurl4-openssl-dev \
libssl-dev \
libxml2-dev \
zlib1g-dev \
python3-dev \
python3-pip

pip install -r /requirements.txt

/Users/hariesramdhani/Documents/requirements.txt

pandas
seaborn

To create image from the above .def file, all you have to do is run the following command (make sure you already activated the interactive mode, notice the [user@lima-singularity-ce] before the $)

[user@lima-singularity-ce]$ singularity build --fakeroot test.sif test.def

The command below build the test.sif from the test.def and you either need to run it as sudo or using the --fakeroot argument. Usually it fails when I'm using sudo, this is why I'm using --fakeroot. The .sif file will appear once you successfully build it.

Well it seems easy and straightforward, doesn't it? It does, but in practice, it can be more complicated when you try to build a more complex system, like for example my .def file will contain different softwares from different programming language different way to install and "different" everything on an ARM64 M2 Macbook. Here I am sharing the lessons I learned when building singularity image for my project.

My ARM != Your AMD Mac transitioned to Apple silicon which means the CPU architecture of your Macbook M series is ARM64. Why is this information important? The thing with Singularity is that it's not architecture agnostic, so when you build your singularity image file on your Mac (ARM64) and try to run it on an AMD64 HPC you will get the following error;

$ singularity run test.sif
FATAL:   could not open image /path/to/dir/test.sif: the image's architecture (arm64) could not run on the host's (amd64)

Can't we just the create the Alpine Linux on our Lima VM to be AMD64 in the first place? Yes, technically by adding the arch parameter and set it to x86_64 we will make our Lima VM architecture to be AMD64.

...

arch: "x86_64"

images:
- location: "https://repo.almalinux.org/almalinux/9/cloud/x86_64/images/AlmaLinux-9-GenericCloud-latest.x86_64.qcow2"
arch: "x86_64"

...

This sounds straightforward and expected to be working but it didn't for me, but luckily after some hacking here and there I found a getaway! What I did for my case was to build an Ubuntu vm then install Singularity inside it.

Can simply create a new instance using the default ubuntu-lts and when prompted with option don't forget to update the yaml file

$ limactl start ubuntu-lts

If the default editor is vim, press i to insert, edit it to add the following and esc followed w+q and enter to save

...

# Change the architecture
arch: "x86_64"

# Upgrade the memory
memory: "8GiB"
mounts:

# Make the mounted point writable
- location: "~"
writable: true
- location: "/tmp/lima"
writable: true

...

It may take more time than the usual to create the instance (most likely because of the different architecture). Once you're done, enter the interactive mode to install singularity.

$ singularity shell ubuntu-lts

And run the following code to install Singularity and its dependencies, change the GO version (1.13.15) and Singularity version (v3.6.3) according to your liking, as seen in the installation guide)

sudo apt-get update && \
sudo apt-get install -y build-essential \
libseccomp-dev pkg-config squashfs-tools cryptsetup

sudo rm -r /usr/local/go

export VERSION=1.13.15 OS=linux ARCH=amd64 # change this as you need

wget -O /tmp/go${VERSION}.${OS}-${ARCH}.tar.gz https://dl.google.com/go/go${VERSION}.${OS}-${ARCH}.tar.gz && \
sudo tar -C /usr/local -xzf /tmp/go${VERSION}.${OS}-${ARCH}.tar.gz

echo 'export GOPATH=${HOME}/go' >> ~/.bashrc && \
echo 'export PATH=/usr/local/go/bin:${PATH}:${GOPATH}/bin' >> ~/.bashrc && \
source ~/.bashrc

curl -sfL https://install.goreleaser.com/github.com/golangci/golangci-lint.sh |
sh -s -- -b $(go env GOPATH)/bin v1.21.0

mkdir -p ${GOPATH}/src/github.com/sylabs && \
cd ${GOPATH}/src/github.com/sylabs && \
git clone https://github.com/sylabs/singularity.git && \
cd singularity

git checkout v3.6.3

cd ${GOPATH}/src/github.com/sylabs/singularity && \
./mconfig && \
cd ./builddir && \
make && \
sudo make install

singularity version

I tried to run the singularity image file that I built using the above VM on an AMD64 HPC and it works successfully, the building process however takes longer time than usual!

NIH HPC Singularity def files are your friend Whether you want to create a cool, complex def file or need to know how to install some stuff (yes R packages I'm looking at you!), you can always refer to NIH HPC Singularity def files collection on GitHub (as there aren't many on Google search), they have it all! Just use the Github advanced search repo:NIH-HPC/singularity-def-files {keyword} to find specific type of commands, also technically you can find more by searching all over GitHub with the path:*.def "Bootstrap:" search keyword, but the NIH HPC was suffice for my work.

The power of script Building singularity (singularity build) image file with a complex .def file may take a while (by a while I meant a whole hour) and when you run the command thousands of lines of installation progress appear (Yes R packages I'm looking at you again), troubleshooting can be very hard when you have to scroll through all of those thousands of lines and looking for where the error occurred. Fortunately we have script command on linux, script {filename_to_save_the_logs} will record all of the things that are written on the terminal whether it's an output, an error message or literally anything, it then will save it for you when you run the exit command. You can then open it using your favourite text editor and CTRL+F or command+F.

Captain we need more memory Setting up the right memory is very vital as you don't want your vm to hang when building your Singularity image files. Happened several times for me before I decided to increase the memory size.

Escaping R and its package dependency hell I had a very hard time installing R and some of its packages, what worked for me was to follow the recipes from NIH HPC Singularity def files collection that contain the keyword Rscript, install_github. While the installation worked it didn't solve the dependencies problem especially when installing the package using install_github and that package requires Bioconductor packages, earlier today I had a supervisory meeting with Mike (my PI) and he came up with this nice idea to actually scrape the DESCRIPTION file for each package from the Github repo and install them first before installing the package. I'll show how I do it on a separate blog post!

So that concludes my writing, hopefully this can be helpful for all of the researchers that are going to the same thing.