Nextflow tips and tricks
Query specific pipeline executions
The Nextflow log command is useful for querying execution metadata and history. You can filter your queries and output specific fields in the printed log.
nextflow log <run_name> -help
Execute Nextflow in the background
The -bg
options allows you to run your pipeline in the background and continue using your terminal. It is similar to nohup
. You can redirect all standard output to a log file.
nextflow run <workflow_repo/main.nf> -bg > workshop_tip.log
Capture a Nextflow pipeline’s configuration
The Nextflow config command prints the resolved pipeline configuration. It is especially useful for printing all resolved parameters and profiles Nextflow will use to run a pipeline.
nextflow config <workflow_repo> -help
Clean Nextflow cache and work directories
The Nextflow clean command will remove files from previous executions stored in the .nextflow
cache and work
directories. The -dry-run
option allows you to preview which files will be deleted.
nextflow clean <workflow_repo> -help
Change default Nextflow cache strategy
Workflow execution is sometimes not resumed as expected. The default behaviour of Nextflow cache keys is to index the input files meta-data information. Reducing the cache stringency to lenient
means the files cache keys are based only on filesize and path, and can help to avoid unexpectedly re-running certain processes when -resume
is in use.
To apply lenient cache strategy to all of your runs, you could add to a custom configuration file:
process {
cache = 'lenient' }
You can specify different cache stategies for different processes by using withName
or withLabel
. You can specify a particular cache strategy be applied to certain profiles
within your institutional config, or to apply to all profiles described within that config by placing the above process
code block outside the profiles
scope.
Access private GitHub repositories
To interact with private repositories on GitHub, you can provide Nextflow with access to GitHub by specifying your GitHub user name and a Personal Access Token in the scm
configuration file inside your specified .nextflow/
directory:
providers {
github {
user = 'georgiesamaha'
password = 'my-personal-access-token'
}
}
Run Nextflow on HPC
Nextflow, by default, spawns parallel task executions wherever it is running. You can use Nextflow’s executors feature to run these tasks using an HPC job schedulers such as SLURM and PBS Pro. Use a custom configuration file to send all processes to the job scheduler as separate jobs and define essential resource requests like cpus
, time
, memory
, and queue
inside a process {}
scope.
Run all workflow tasks as separate jobs on HPC
In this custom configuration file we have sent all tasks that a workflow is running to a PBS Pro job scheduler and specified jobs to be run on the normal queue, each running for a max time of 3 hours with 1 cpu and 4 Gb of memory:
process {
executor = 'pbspro'
queue = 'normal'
cpus = 1
time = '3h'
memory = '4.GB' }
Run processes with different resource profiles as HPC jobs
Adjusting the custom configuration file above, we can use the withName {}
process selector to specify process-specific resource requirements:
process {
executor = 'pbspro'
withName processONE {
queue = 'normal'
cpus = 1
time = '3h'
memory = '4.GB'
}
withName processTWO {
queue = 'hugemem'
cpus = 48
time = '10h'
memory = '400.GB'
} }
Specify infrastructure-specific directives for your jobs
Adjusting the custom configuration file above, we can define any native configuration options using the clusterOptions directive. We can use this to specify non-standard resources. Below we have specified which HPC project code to bill for all process jobs:
process {
executor = 'pbspro'
clusterOptions = '-P project001'
withName processONE {
queue = 'normal'
cpus = 1
time = '3h'
memory = '4.GB'
}
withName processTWO {
queue = 'hugemem'
cpus = 48
time = '10h'
memory = '400.GB'
} }
Additional resources
Here are some useful resources to help you get started with running nf-core pipelines and developing Nextflow pipelines: