Resource configuration

WORK IN PROGRESS

Resource configuration in Nextflow workflows can be challenging due to the diversity of computational environments we each work in. Each environment has unique resource constraints and management systems which can complicate the allocation of resources like CPUs, memory, and storage.

Ensuring you’re resource efficient will minimise the runtime and reduce the cost of running your workflows. Poorly configured workflows can lead to failed jobs, wasted computational time, and overuse of resources, particularly in HPC and cloud environments.

https://nextflow.io/blog/2024/optimizing-nextflow-for-hpc-and-cloud-at-scale.html

Nextflow configuration files

The core of resource configuration in Nextflow should be contained within the nextflow.config and any other custom configuration files. When a workflow is executed with nextflow run main.nf, Nextflow looks for the nextflow.config and any other .config files in the current directory and the base directory of the execution script. It will also check $HOME/.nextflow/config. When more than 1 of these files exists, they are merged so that the default settings in the nextflow.config are overwritten as required.

These configuration files allow you to specify settings for resources such as:

  • cpus
  • memory
  • time
  • executor
  • env variables

Here is a basic example of how you can set these resources for all processes in a workflow within the process scope in a configuration file:

process {
  cpus = 2
  memory = '4.GB'
  time = '2.h'
}

In Nextflow, resource directives are specified within the process block. These directives control how many CPUs, how much memory, and how much time each process is allocated. While a default minimum resource allocation can be suitable in some workflows, this will not always work in more complex workflows and you will need to configure resources per process. Here is an example of how you’d configure the resources for a specific process, overwriting default settings for that process:

process {
  cpus = 2
  memory = '4.GB'
  time = '2.h'
  
  // Provide additional memory for indexing process
  withName: 'STAR_INDEX' {
    cpus = 1
    memory = '32.GB'
  }
}

Dynamic resource allocations

Nextflow also allows you to dynamically allocate resources based on the input data size or task type. For example, you might need to adjust memory based on the size of an input file: