Sydney Informatics Hub training
  • Home
  • Setup
  • Lesson plan
  • Session 1
    • 1.0 Session 1 kick-off
    • 1.1 Introduction to Nextflow
    • 1.2 Introduction to nf-core
    • 1.3 Configuring nf-core workflows
    • 1.4 Commands for users
  • Session 2
    • 2.0 Session 2 kick-off
    • 2.1 Design a run command
    • 2.2 Use a parameters file
    • 2.3 Configure resources
    • 2.4 Configurations to customise outputs
  • Tips & Tricks

On this page

  • Query specific pipeline executions
  • Execute Nextflow in the background
  • Capture a Nextflow pipeline’s configuration
  • Clean Nextflow cache and work directories
  • Change default Nextflow cache strategy
  • Access private GitHub repositories
  • Run Nextflow on HPC
  • Additional resources

Nextflow tips and tricks

Query specific pipeline executions

The Nextflow log command is useful for querying execution metadata and history. You can filter your queries and output specific fields in the printed log.

nextflow log <run_name> -help

Execute Nextflow in the background

The -bg options allows you to run your pipeline in the background and continue using your terminal. It is similar to nohup. You can redirect all standard output to a log file.

nextflow run <workflow_repo/main.nf> -bg > workshop_tip.log

Capture a Nextflow pipeline’s configuration

The Nextflow config command prints the resolved pipeline configuration. It is especially useful for printing all resolved parameters and profiles Nextflow will use to run a pipeline.

nextflow config <workflow_repo> -help

Clean Nextflow cache and work directories

The Nextflow clean command will remove files from previous executions stored in the .nextflow cache and work directories. The -dry-run option allows you to preview which files will be deleted.

nextflow clean <workflow_repo> -help

Change default Nextflow cache strategy

Workflow execution is sometimes not resumed as expected. The default behaviour of Nextflow cache keys is to index the input files meta-data information. Reducing the cache stringency to lenient means the files cache keys are based only on filesize and path, and can help to avoid unexpectedly re-running certain processes when -resume is in use.

To apply lenient cache strategy to all of your runs, you could add to a custom configuration file:

process {
    cache = 'lenient'
}

You can specify different cache stategies for different processes by using withName or withLabel. You can specify a particular cache strategy be applied to certain profiles within your institutional config, or to apply to all profiles described within that config by placing the above process code block outside the profiles scope.

Access private GitHub repositories

To interact with private repositories on GitHub, you can provide Nextflow with access to GitHub by specifying your GitHub user name and a Personal Access Token in the scm configuration file inside your specified .nextflow/ directory:

providers {

  github {
    user = 'georgiesamaha'
    password = 'my-personal-access-token'
  }

}

Run Nextflow on HPC

Nextflow, by default, spawns parallel task executions wherever it is running. You can use Nextflow’s executors feature to run these tasks using an HPC job schedulers such as SLURM and PBS Pro. Use a custom configuration file to send all processes to the job scheduler as separate jobs and define essential resource requests like cpus, time, memory, and queue inside a process {} scope.

Run all workflow tasks as separate jobs on HPC

In this custom configuration file we have sent all tasks that a workflow is running to a PBS Pro job scheduler and specified jobs to be run on the normal queue, each running for a max time of 3 hours with 1 cpu and 4 Gb of memory:

process {
  executor = 'pbspro'
  queue = 'normal'
  cpus = 1
  time = '3h'
  memory = '4.GB'
}

Run processes with different resource profiles as HPC jobs

Adjusting the custom configuration file above, we can use the withName {} process selector to specify process-specific resource requirements:

process {
  executor = 'pbspro'
    
  withName processONE {
    queue = 'normal'
    cpus = 1
    time = '3h'
    memory = '4.GB'
  }

  withName processTWO {
    queue = 'hugemem'
    cpus = 48
    time = '10h'
    memory = '400.GB'
  }
}

Specify infrastructure-specific directives for your jobs

Adjusting the custom configuration file above, we can define any native configuration options using the clusterOptions directive. We can use this to specify non-standard resources. Below we have specified which HPC project code to bill for all process jobs:

process {
  executor = 'pbspro'
  clusterOptions = '-P project001'
    
  withName processONE {
    queue = 'normal'
    cpus = 1
    time = '3h'
    memory = '4.GB'
  }

  withName processTWO {
    queue = 'hugemem'
    cpus = 48
    time = '10h'
    memory = '400.GB'
  }
}

Additional resources

Here are some useful resources to help you get started with running nf-core pipelines and developing Nextflow pipelines:

  • Nextflow tutorials
  • nf-core pipeline tutorials
  • Nextflow patterns
  • HPC tips and tricks
  • Nextflow coding best practice recommendations
  • The Nextflow blog
All materials copyright Sydney Informatics Hub, University of Sydney