2.3. Configuring a run for your environment
In the previous exercises, we have explored how to customise a run with workflow parameters on the command line or within a parameters file. In this lesson we will now look at configuration settings, which manage how the workflow is implemented on your system.
2.3.1. Default nf-core configuration
Recall that when a pipeline script is launched, Nextflow looks for configuration files in multiple locations:
At level 5 of the above priority list is the file workflow/nextflow.config
. This file also applies workflow/conf/base.config
to the workflow execution with the following statement:
includeConfig 'conf/base.config'
Together, these two configuration files define the default execution settings and parameters of an nf-core workflow.
Let’s take a look at these two configuration files to gain an understanding of how defaults are applied.
➤ Using the more
command, scroll through workflow/conf/base.config
:
more nf-core-rnaseq-3.11.1/workflow/conf/base.config
The generic base.config
sets the default compute resource settings to be used by the processes in the nf-core workflow. It uses process labels to enable different sets of resources to be applied to groups of processes that require similar compute. These labels are specified within the main.nf
file for a process.
We can over-ride these default compute resources using a custom configuration file.
➤ Then take a few moments to scroll through workflow/nextflow.config
:
more nf-core-rnaseq-3.11.1/workflow/nextflow.config
The nextflow.config
file is more workflow-specific, and sets the defaults for the workflow parameters, as well as defines profiles to change the default software access from $PATH
to the specified access method, eg Singularity.
We can over-ride these parameters on the command line or with a parameters file, and over-ride the default behaviour of searching for tools on $PATH
by specifying a -profile
.
Default settings for --max_cpus
, --max_memory
and --max_time
are applied within the nf-core workflow/nextflow.config
. These are generous values expecting to be over-ridden with your custom settings, to ensure that no single process attempts to use more resources than you have available on your platform.
Within workflow/conf/base.config
, the check_max()
function over-rides the process resources if the custom ‘max’ setting is lower than the default setting for that process.
2.3.2. When to use a custom config file
In our runs so far, we have avoided the need for a custom resource configuration file by:
- Over-riding the default tool access method of
$PATH
by specifying thesingularity
profile defined inworkflow/nextflow.config
- Without this, our runs for this workshop would fail because we do not have the workflow tools (eg STAR, salmon) installed locally on our VMs
- Over-riding the default values for CPUs and memory set in
nextflow.config
with--max_cpus 2
and--max_memory 6.GB
to fit on our small VMs- Without these parameters, our runs would fail at the first process that requests more than this, because Nextflow workflows check that the requested resources are available before attempting to execute a workflow
These are basic configurations. What if:
- We wanted to increase the resources used above what is requested with process labels to take advantage of high CPU or high memory infrastructures?
- We wanted to run on a HPC or cloud infrastructure?
- We wanted to execute specific modules on specific node types on a cluster?
- We wanted to use a non-default software container?
- We wanted to customise outputs beyond what was possible with the nf-core workflow parameters?
💡 This is all possible with custom configuration files!
The rest of lesson 2.3 will explore custom resource configuration files, while lesson2.4 will focus on customising outputs. We won’t be covering custom workflow execution on HPCs in this workshop, but please check out our tips and tricks page later if you are interested in this!
2.3.3. Institutional config files
We can set these and other configurations within a custom configuration file that is specific to our institution: this is referred to as an institutional config.
Institutional configs help us create efficient workflows that can be shared with others to reproducibly run the workflow in the same computational environment
In lesson 1 you learnt that there is a repository of institutional configs for nf-core pipelines. These have been contributed to by the community.
We have created an nf-core config for Pawsey’s Nimbus cloud: this (and other institutional configs) was downloaded along with the workflow code.
➤ View the available list of institutional configs we pulled down along with the workflow code:
ls nf-core-rnaseq-3.11.1/configs/conf
➤ Then take a look at the Pawsey Nimbus config:
more nf-core-rnaseq-3.11.1/configs/conf/pawsey_nimbus.config
In the event where your institution does not have a publicly available configuration file and/or you want to apply your own customisations, you will need to write your own institutional config file.
💡 You can contribute to the nf-core community by sharing your config!
For the sake of the exercise, let’s assume there wasn’t a Pawsey Nimbus config publicly available, and write our own that is specific to our ‘c2r8’ VMs.
➤ Open a new file called custom-nimbus.config
and start writing some Nextflow code by adding:
// Nimbus nf-core workshop configuration file
profiles {
workshop {
} }
Using the profiles scope in a configuration file groups attributes that belong to the same profile, in our case a profile we have chosen to name workshop.
➤ Inside this workflow profile, let’s remove the need for the -profile singularity
flag from our run command by adding another scope called singularity:
// Nimbus nf-core workshop configuration file
profiles {
workshop {
singularity {
enabled = true
autoMounts = true
cacheDir = '/home/training/singularity_cache'
}
} }
➤ Now let’s address those two resource parameters --max_memory 6.GB
and --max_cpus 2
. At the same level as the singularity
scope, add a params
scope and specify each parameter underneath:
// Nimbus nf-core workshop configuration file
profiles {
workshop {
singularity {
enabled = true
autoMounts = true
cacheDir = '/home/training/singularity_cache'
}
params {
max_cpus = 2
max_memory = 6.GB
}
} }
➤ Add finally, add a profile description using the config_profile_description
parameter:
// Nimbus nf-core workshop configuration file
profiles {
workshop {
singularity {
enabled = true
autoMounts = true
cacheDir = '/home/training/singularity_cache'
}
params {
config_profile_description = 'Pawsey Nimbus c2r8 profile'
max_cpus = 2
max_memory = 6.GB
}
} }
➤ Save the config then re-run the pipeline, requesting the workshop
profile be applied from our custom-nimbus.config
and set the --outdir
parameter to Lesson-2.3.3
nextflow run nf-core-rnaseq-3.11.1/workflow/main.nf \
-profile workshop \
-c custom-nimbus.config \
-params-file workshop-params.yaml \
--outdir Lesson-2.3.3 \ -resume
👀 We can see that our custom configurations have been read:
- Our Nimbus config is listed under Core Nextflow options
- Our profile description shows under Institutional config options
- Our
max_cpus
andmax_memory
show under Max job request options
⌛ Applying the new profile means the processes will execute from scratch rather than from cached files!
While we wait, let’s talk about the 🐘 in the room…
2.3.4. Custom resource configuration using process labels
Capping workflow resources using the max
parameters is a bit of a blunt instrument.
To achieve optimum computational efficiency on your platform, more granular control may be required.
The next two lessons will demonstrate how to achieve this using custom configuration files that fine-tune resources using process labels
to assign the same resources to groups of processes sharing the same label, or withName
to target specific processes.
In order to do this, we need to use the process
scope. Nextflow has a number of different scopes
that can be included in configuration files, for example the params
scope you covered in lesson 1.3 and applied to your config in lesson 2.3.3.
Within the process
scope, we can configure resources and additional arguments for processes.
➤ Edit custom-nimbus.config
to set cpus = 2
and memory = 6.GB
using process labels
within the process
scope:
// Nimbus nf-core workshop configuration profile
profiles {
workshop {
singularity {
enabled = true
autoMounts = true
cacheDir = '/home/training/singularity_cache'
}
params {
config_profile_description = 'Pawsey Nimbuc c2r8 profile'
}
process {
withLabel: process_low {
cpus = 2
memory = 6.GB
}
withLabel: process_medium {
cpus = 2
memory = 6.GB
}
withLabel: process_high {
cpus = 2
memory = 6.GB
}
}
} }
➤ Save the file then re-run the workflow with our custom configuration, setting outdir
parameter to Lesson-2.3.4
:
nextflow run nf-core-rnaseq-3.11.1/workflow/main.nf \
-profile workshop \
-c custom-nimbus.config \
-params-file workshop-params.yaml \
--outdir Lesson-2.3.4 \ -resume
👀 Notice that the Max job request options are no longer listed on the run log printed to screen, because we are setting them within the process
scope rather than params
scope.
2.3.5. Custom resource configuration using process names
This exercise will demonstrate how to adjust the resource configurations for a specific process using the withName
process selector, using the STAR_ALIGN module as example.
withName
is a powerful tool:
- Specifically target individual modules
- Multiple module names can be supplied using wildcards or ‘or’ (
*
or|
) notation - No need to edit the module
main.nf
file to add a process label - Has a higher priority than
withLabel
To utilise withName
, we first need to ensure we have the correct and specific executuion path for the module/modules that we wish to target.
➤ Identify the execution path for the STAR_ALIGN module:
You can view the modules.conf file on Github or search your local copy:
grep STAR nf-core-rnaseq-3.11.1/workflow/conf/modules.config
withName: 'UNTAR_.*|STAR_GENOMEGENERATE|STAR_GENOMEGENERATE_IGENOMES|HISAT2_BUILD' {
// STAR Salmon alignment options
withName: '.*:ALIGN_STAR:STAR_ALIGN|.*:ALIGN_STAR:STAR_ALIGN_IGENOMES' {
withName: '.*:QUANTIFY_STAR_SALMON:SALMON_QUANT' {
withName: '.*:QUANTIFY_STAR_SALMON:SALMON_TX2GENE' {
withName: '.*:QUANTIFY_STAR_SALMON:SALMON_TXIMPORT' {
withName: '.*:QUANTIFY_STAR_SALMON:SALMON_SE_.*' {
withName: 'DESEQ2_QC_STAR_SALMON' { // STAR RSEM alignment options
For STAR_ALIGN within the nf-core/rnaseq
workflow, any of the following would be correct and specific:
'NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN'
'.*:RNASEQ:ALIGN_STAR:STAR_ALIGN' '.*:ALIGN_STAR:STAR_ALIGN'
➤ Continue editing custom-nimbus.config
. Inside the process
scope, provide the execution path for the STAR_ALIGN module to the withName
selector:
process {
withName: '.*:RNASEQ:ALIGN_STAR:STAR_ALIGN' {
} }
➤ Then set CPU to 24 and memory to 96 GB:
process {
withName: '.*:RNASEQ:ALIGN_STAR:STAR_ALIGN' {
cpus = 24
memory = 96.GB
} }
➤ Save the config then resume your run, setting outdir
to Lesson-2.3.5
, once again applying your workshop
profile from custom-nimbus.config
:
nextflow run nf-core-rnaseq-3.11.1/workflow/main.nf \
-profile workshop \
-c custom-nimbus.config \
-params-file workshop-params.yaml \
--outdir Lesson-2.3.5 \ -resume
If your execution path for the STAR_ALIGN module was specified correctly, your run should have died with the error shown below because Nextflow checks that the resources requested are available before executing a workflow: