Running AlphaFold in a Docker/Singularity Container

Containers

Containers, in general, are a technology that enables the packaging and isolation of applications and their dependencies in a consistent environment. They offer a lightweight and efficient way to deploy software by bundling everything needed to run an application, including its code, runtime, libraries, and settings, into a single package called a container. This approach ensures that the application runs consistently across different environments, from development to production. Containers provide a more lightweight alternative to traditional virtual machines, as they share the host operating system’s kernel, resulting in faster startup times and reduced resource overhead. This technology is widely used to simplify application deployment, improve scalability, and enhance the consistency of software across various systems.

AlphaFold with Docker

Docker is a virtualization solution that streamlines the development and deployment of applications. This is achieved through the encapsulation of an application into a self-contained container. This container comprises not only the application’s code but also its entire assortment of libraries, dependencies, as well as runtime and environmental configurations. This self-contained nature ensures consistent execution across diverse settings, markedly easing the complexities associated with application distribution, execution, and reproducibility.

The officially recommend way to run local versions of AlphaFold is through containerisation.. This implementation orchestrates different parts of the analysis to run efficiently (on gpu, cpu, etc). The problem with the provided docker version is the requirement for elevated privileged access to the machine you are working on. This is difficult to accommodate in most shared facilities.

There are many different versions of AlphaFold installations that utilise docker to ultimately run. We provide several options for configuration and running AF optimised for Artemis. Although, you may use this container on any system that supports it.

AlphaFold with Singularity

Singularity is a containerization tool designed with a focus on security and compatibility, particularly in high-performance computing and scientific computing environments. It allows you to encapsulate an application, including its code, libraries, and dependencies, into a single executable image. Unlike Docker, Singularity emphasizes process-level isolation and doesn’t require root privileges to run, making it well-suited for multi-user systems. It’s commonly used for creating self-contained, reproducible environments for complex computations and simulations.

AlphaFold with Singularity on Artemis (or anywhere)

We make use of our AlphaFold Docker images to build and run AF on Artemis. This is a two-step process. You must first build the docker image into a Singularity image. Then you may run it as below.

Navigate to your project folder, e.g. cd /project/Training/AF

First make a new jobscript for the build step, nano alpha_build.pbs:

#!/bin/bash
#PBS -P Training
#PBS -l select=1:ncpus=2:mem=16gb:ngpus=0
#PBS -l walltime=02:00:00
#PBS -N build01

module load singularity cuda/10.2.89

cd $PBS_O_WORKDIR
export SINGULARITY_CACHEDIR=`pwd`
export SINGULARITY_TMPDIR=`pwd`

singularity build colab.img docker://sydneyinformaticshub/alphafold:colabfold_v1.3.0

Run the build step with qsub alpha_build.pbs.

You only have to do this once. Alternatively, you can build the image offline and move the Singularity image to Artemis, or move this file anywhere else you wish to run with Singularity. Note, if you make any changes upstream to the Docker image, you must rebuild it to have those changes implemented.

Next, make a jobscript to actually run the image, nano alpha_sing.pbs

#!/bin/bash
#PBS -P Training
#PBS -l select=1:ncpus=8:mem=32gb:ngpus=1
#PBS -l walltime=02:00:00
#PBS -N job_colab01_gpu

module load singularity cuda/10.2.89

cd $PBS_O_WORKDIR

singularity run --nv /project/Training/DATA/colab.img /bin/bash -c "colabfold_batch --templates --num-recycle 1 /project/Training/DATA/input.fasta "$PBS_O_WORKDIR"/output"

Timing

This job takes around 8 minutes with 8 cpus, 32 gb ram, and 1 gpu. Incredibly fast to make a prediction compared with native AF (albeit with different parameters and quality of model results). Without the GPU it takes around 40 minutes.

Colabfold singularity image

Note: You can grab a copy of the 8GB Singularity image and skip the build step!

wget -O colab.img https://cloudstor.aarnet.edu.au/plus/s/OlBeNi6xNV9wBMr/download