1.4 Introduction to nf-core/rnaseq

Objectives

Learn about the nf-core/rnaseq pipeline
Understand the levels of customisation available for nf-core pipelines
Use the nf-core documentation to select appropriate parameters for a run command
Write a run command for nf-core/rnaseq
Explore pipeline deployment and outputs

1.4.1 An introduction to nf-core/rnaseq

For the remainder of this workshop, we will be working with a real-world nf-core pipeline for RNA-seq analysis: nf-core/rnaseq. The nf-core website describes this pipeline as "a bioinformatics pipeline that can be used to analyse RNA sequencing data obtained from organisms with a reference genome and annotation". It implements a variety of tools and has several branching paths that allow users to select the type of alignment and post-processing they desire, as demonstrated in the nf-core metro-map below:

In this lesson, we will download nf-core/rnaseq, explore its functionality, and identify processes that may need to be adjusted or customised. By the end of the lesson, we will have constructed a basic run command to execute the pipeline, and then in the next session we will dive further into the customisation options. While nf-core pipelines are designed to run with 'sensible' default settings, these may not always suit the needs of your experiment or compute environment. Designing a custom run command requires you to identify which workflow parameters you need to specify to suit your circumstances and experimental design, and which configurations you may need to apply to efficiently execute the workflow on your compute platform

Create a fresh working directory

Before we proceed, let's create a fresh working directory for all our experiments with nf-core/rnaseq.

Create a new working directory

Go ahead and make a new session2 folder in your home directory and open it in VS Code.

mkdir ~/session2

You can open the new directory either via the VS Code GUI (File > Open Folder... or Ctrl + O / Cmd + O) and navigating to it, or by using the code command in the terminal:

code ~/session2

Download the pipeline code

As mentioned above, we will be working with version 3.23.0 of the nf-core/rnaseq pipeline for the rest of the workshop. Go ahead and download that to your working directory now.

Exercise 1.4.1 2 mins

Clone the nf-core/rnaseq pipeline to your working directory. Ensure that you download the version 3.23.0.

Solution

Recall from earlier today that we can clone a workflow with nextflow clone. Recall also that we can pull a specific version of the workflow with the -r parameter:

nextflow clone -r 3.23.0 nf-core/rnaseq

The pipeline files will be downloaded to ~/session2/rnaseq. Let's have a look at the directory structure.

List the downloaded directory contents

ls -lh rnaseq

You should see the following:

Output

total 340K
drwxrwxr-x 2 tdev01 tdev01 4.0K Apr 16 03:31 assets
drwxrwxr-x 2 tdev01 tdev01 4.0K Apr 16 03:31 bin
-rwxrwxr-x 1 tdev01 tdev01 113K Apr 16 03:31 CHANGELOG.md
-rwxrwxr-x 1 tdev01 tdev01  11K Apr 16 03:31 CITATIONS.md
-rwxrwxr-x 1 tdev01 tdev01  14K Apr 16 03:31 CODE_OF_CONDUCT.md
drwxrwxr-x 2 tdev01 tdev01 4.0K Apr 16 03:31 conf
drwxrwxr-x 5 tdev01 tdev01 4.0K Apr 16 03:31 docs
-rwxrwxr-x 1 tdev01 tdev01 1.1K Apr 16 03:31 LICENSE
-rwxrwxr-x 1 tdev01 tdev01 7.0K Apr 16 03:31 main.nf
drwxrwxr-x 4 tdev01 tdev01 4.0K Apr 16 03:31 modules
-rwxrwxr-x 1 tdev01 tdev01  24K Apr 16 03:31 modules.json
-rwxrwxr-x 1 tdev01 tdev01  17K Apr 16 03:31 nextflow.config
-rwxrwxr-x 1 tdev01 tdev01  57K Apr 16 03:31 nextflow_schema.json
-rwxrwxr-x 1 tdev01 tdev01 1.5K Apr 16 03:31 nf-test.config
-rwxrwxr-x 1 tdev01 tdev01  13K Apr 16 03:31 README.md
-rwxrwxr-x 1 tdev01 tdev01  23K Apr 16 03:31 ro-crate-metadata.json
drwxrwxr-x 4 tdev01 tdev01 4.0K Apr 16 03:31 subworkflows
drwxrwxr-x 2 tdev01 tdev01 4.0K Apr 16 03:31 tests
-rwxrwxr-x 1 tdev01 tdev01 3.0K Apr 16 03:31 tower.yml
drwxrwxr-x 3 tdev01 tdev01 4.0K Apr 16 03:31 workflows

The files and directories we will be interacting with in this workshop are:

Feature	Importance
`conf/`	Contains files, default configuration settings and optional profiles that build on global settings set by `nextflow.config`
`main.nf`	The executable Nextflow script that defines the structure and flow of the workflow. It calls `workflows/rnaseq.nf`
`modules/`	Contains Nextflow processes used by the workflow. Each process is split into a module with its own `main.nf` file
`workflows/rnaseq.nf`	The complete rnaseq pipeline, containing modules and subworkflows that are connected by channels

1.4.2 Design your run command

As we learnt in Lesson 1.3.3, all nf-core pipelines have a unique set of pipeline-specific parameters that can be used in conjunction with Nextflow parameters to configure the workflow. Generally, nf-core pipelines can be customised at a few different levels:

Level of effect	Customisation feature
The workflow	Where diverging methods are available for a pipeline, you may choose a path to follow
A process	Where more than one tool is available for a single step, you may choose which to use
A tool	Apply specific thresholds or optional flags for a tool on top of the default run command
Compute resources	Specify resource thresholds or software execution methods for the workflow or a process

All nf-core pipelines are provided with comprehensive documentation that explain what the default workflow structure entails and options for customising this based on your needs. It is important to remember that nf-core pipelines typically do not include all possible tool parameters. This makes it challenging to piece these different sources of information together to determine which parameters you should be using.

The following sections of the documentation can be used to understand what the pipeline is doing and inform your choices about aspects of pipeline-specific customisations:

Docs	Description	Customisation level
Introduction	Workflow summary	workflow process
Usage	Inputs and options	workflow process
Parameters	Available flags	workflow process compute resources
Output	Files from all processes	workflow process tool

Pipeline structure

Looking at the nf-core/rnaseq pipeline structure provided in the introduction, we can see that the developers have:

Organised the workflow into 5 stages based on the type of work that is being done
Provided a choice of multiple methods and specified defaults
Provided a choice of tool for some steps

Poll 1.4.2 2 min

Observing the diagram above, which statement is true regarding the choice of alignment and quantification methods provided by the nf-core/rnaseq pipeline?

a. The pipeline uses a fixed method for alignment and quantification
b. Users can choose between several different methods for alignment and quantification
c. The pipeline always performs alignment and quantification using STAR or HISAT2
d. The choice of alignment and quantification method is determined automatically based on the input data

Answer

The correct answer is b. The nf-core/rnaseq pipeline allows users to choose between pseudo-alignment and quantification or several different methods for genome-based read alignment and quantification.

a is incorrect because the pipeline is not limited to a single method.
c is incorrect because HISAT2 is an alignment tool, not a quantification tool, and users can choose between one of three alignment tools (STAR, HISAT2, and Bowtie2) and two quantification tools (RSEM and Salmon), or they can choose the joint pseudo-alignment and quantification method.
d is also incorrect, as the pipeline only accepts fastq files as input and the choice of alignment and quantification method must be specified by the user.

The choices shown in the metro-map above demonstrate the flexibility common to many nf-core workflows, and thus the importance of learning how to customise nf-core runs to best suit your data. Alternative tools and methods are routinely incorporated in workflows, as each dataset is unique and what works well for one may not work well for another. Once you are familiar with customising nf-core workflows, you can easily try out different combinations of tools and parameters to get the best results out of your data 🏆

Default pipeline usage

The number and type of default and optional parameters an nf-core pipeline accepts is at the discretion of it's developers. However, at a minimum, nf-core pipelines typically:

Require users to specify a sample sheet (--input) detailing sample data and relevant metadata
Autogenerate or acquire missing reference files from iGenomes (using the --genome parameter) if not provided by the user.

Print the nf-core/rnaseq parameters

Recall that we can print out pipeline information, including available paramters, with the --help parameter. Print out the available parameters for the nf-core/rnaseq pipeline by running:

nextflow run rnaseq --help

The typical or recommended run command for this pipeline is provided at the top of the screen:

It outlines a requirement for a few basic things:

An input samplesheet
A location to store outputs
A software management method

Reminder: hyphens matter in Nextflow!

Nextflow-specific parameters use one (-) hyphen, whereas pipeline-specific parameters use two (--). In the typical run command above -profile is a Nextflow parameter, while --input is an nf-core parameter.

Most of us will need to adjust the default run command for our experiments. To get us started with running nf-core/rnaseq, we will be adjusting the typical run command by:

Providing our own reference files
Using the Singularity software management profile
Specifying the computing resource limitations of our VMs (2 CPUs, 8 GB RAM)
Overwriting some default parameters to speed up execution and keep intermediate files

Our dataset

Our input FASTQ files (fastqs/), reference data (mm10_reference/), and full sample sheet (samplesheet.csv) are already available within the home directory. Let's take a look at the input files.

Examine our input data

Start by listing the ~/data directory to see what it contains:

ls -l ~/data

Output

drwxrwxr-x 2 training training 4096 Feb 14 05:36 fastqs
drwxrwxr-x 3 training training 4096 Feb 14 05:46 mm10_reference
-rw-rw-r-- 1 training training  641 Feb 16 05:57 samplesheet.csv

We can also use the tree command to get a deeper look at the folder structure:

tree ~/data

Output

data
|-- fastq
|   |-- SRR3473988_selected.fastq.gz
|   `-- SRR3473989_selected.fastq.gz
|-- mm10_reference
|   |-- STAR
|   |   |-- Genome
|   |   |-- Log.out
|   |   |-- SA
|   |   |-- SAindex
|   |   |-- chrLength.txt
|   |   |-- chrName.txt
|   |   |-- chrNameLength.txt
|   |   |-- chrStart.txt
|   |   |-- exonGeTrInfo.tab
|   |   |-- exonInfo.tab
|   |   |-- geneInfo.tab
|   |   |-- genomeParameters.txt
|   |   |-- sjdbInfo.txt
|   |   |-- sjdbList.fromGTF.out.tab
|   |   |-- sjdbList.out.tab
|   |   `-- transcriptInfo.tab
|   |-- mm10_chr18.fa
|   |-- mm10_chr18.gtf
|   `-- salmon-index
|       |-- complete_ref_lens.bin
|       |-- ctable.bin
|       |-- ctg_offsets.bin
|       |-- duplicate_clusters.tsv
|       |-- info.json
|       |-- mphf.bin
|       |-- pos.bin
|       |-- pre_indexing.log
|       |-- rank.bin
|       |-- refAccumLengths.bin
|       |-- ref_indexing.log
|       |-- reflengths.bin
|       |-- refseq.bin
|       |-- seq.bin
|       `-- versionInfo.json
`-- samplesheet.csv

Finally, take a look at the samplesheet.csv file to see what information the nf-core/rnaseq pipeline requires for each sample:

cat ~/data/samplesheet.csv

Output

sample,fastq_1,fastq_2,strandedness
SRR3473988,/home/training/data/fastq/SRR3473988_selected.fastq.gz,,forward
SRR3473989,/home/training/data/fastq/SRR3473989_selected.fastq.gz,,forward

Sample data source

The sample data was obtained from a public dataset that has been made available on NCBI's BioProject collection and published in the following paper: Corley, S.M., Canales, C.P., Carmona-Mora, P. et al. RNA-Seq analysis of Gtf2ird1 knockout epidermal tissue provides potential insights into molecular mechanisms underpinning Williams-Beuren syndrome. BMC Genomics 17, 450 (2016). https://doi.org/10.1186/s12864-016-2801-4 (BioProject ID: PRJNA320433).

Reference data

Many nf-core pipelines have a minimum requirement for reference data inputs. The input reference data requirements for this pipeline are provided in the usage documentation. To see what reference files we can specify using parameters, rerun the pipeline's help command to view all the available parameters.

Find the available parameters for specifying reference files

nextflow run rnaseq --help

The Reference genome options section lists all of the reference file options we could possibly provide:

Output

Reference genome options
    --genome                      [string]  Name of iGenomes reference. 
    --fasta                       [string]  Path to FASTA genome file. 
    --gtf                         [string]  Path to GTF annotation file. 
    --gff                         [string]  Path to GFF3 annotation file. 
    --gene_bed                    [string]  Path to BED file containing gene intervals. This will be created from the GTF file if not specified. 
    --transcript_fasta            [string]  Path to FASTA transcriptome file. 
    --additional_fasta            [string]  FASTA file to concatenate to genome FASTA file e.g. containing spike-in sequences. 
    --splicesites                 [string]  Splice sites file required for HISAT2. 
    --star_index                  [string]  Path to directory or tar.gz archive for pre-built STAR index. 
    --hisat2_index                [string]  Path to directory or tar.gz archive for pre-built HISAT2 index. 
    --rsem_index                  [string]  Path to directory or tar.gz archive for pre-built RSEM index. 
    --salmon_index                [string]  Path to directory or tar.gz archive for pre-built Salmon index. 
    --kallisto_index              [string]  Path to directory or tar.gz archive for pre-built Kallisto index. 
    --bowtie2_index               [string]  Path to directory or tar.gz archive for pre-built Bowtie2 index. 
    --hisat2_build_memory         [string]  Minimum memory required to use splice sites and exons in the HiSAT2 index build process. [default: 200.GB] 
    --gencode                     [boolean] Specify if your GTF annotation is in GENCODE format. 
    --gffread_transcript_fasta    [boolean] Use gffread to generate transcript FASTA instead of RSEM. 
    --gtf_extra_attributes        [string]  By default, the pipeline uses the `gene_name` field to obtain additional gene identifiers from the input GTF file when running Salmon. 
    [default: gene_name]  
    --gtf_group_features          [string]  Define the attribute type used to group features in the GTF file when running Salmon. [default: gene_id] 
    --featurecounts_group_type    [string]  The attribute type used to group feature types in the GTF file when generating the biotype plot with featureCounts. [default: gene_biotype] 
    
    --featurecounts_feature_type  [string]  By default, the pipeline assigns reads based on the 'exon' attribute within the GTF file. [default: exon]

For this workshop, we will only require four of these parameters:

--fasta: A path to a FASTA file containing our reference genome sequence.
--gtf: A path to a GTF file containing genome annotations such as locations of genes, transcripts, exons, etc.
--star_index: A path to a set of files used by the STAR aligner software to map sequencing reads to their origin in the genome.
--salmon_index: A path to a set of files used by the Salmon pseudoaligner to count transcript reads from RNA sequencing data.

For each of these parameters, we have the following files that we can use:

Reference file	File path
FASTA	`/home/<USERNAME>/data/mm10_reference/mm10_chr18.fa`
GTF	`/home/<USERNAME>/data/mm10_reference/mm10_chr18.gtf`
STAR index	`/home/<USERNAME>/data/mm10_reference/STAR`
Salmon index	`/home/<USERNAME>/data/mm10_reference/salmon-index`

Note that we are just using chr18 as it is a relatively small chromosome, so this should help to keep the run time for our exercises nice and short.

Writing the run command: required `--input` and `--outdir` parameters

The pipeline requires us to define both an input samplesheet and an output directory to place our results. We supply these with the --input and --outdir parameters, respectively. We've already looked at our input samplesheet: ~/data/samplesheet.csv. Our output directory can be named anything we want, and will be automatically created by Nextflow if it doesn't already exists.

Exercise 1.4.2.1 3 mins

Start writing a run command for the rnaseq pipeline. Start by providing the samplesheet as input. Also define an output directory called lesson-1.4.

Solution

Start by writing out the basic nextflow run command:

nextflow run rnaseq \

Note that we have added a space and a backslash (\) to the end of the line so we may continue writing the full command over multiple lines for legibility. You can choose to follow along like this or you can write the command on one single line and omit the backslashes. If you're following the multi-line convention, you can press Enter immediately after the backslash and you will be provided a new line to continue writing the command. Be aware, however, that the space before the backslash is important. It is also important that you don't have any spaces after the backslash, or the command will run prematurely and fail.

Next, add the --input parameter and pass it the path to the samplesheet:

nextflow run rnaseq \
    --input ~/data/samplesheet.csv \

Finally, add the --outdir parameter and give it the name lesson-1.4:

nextflow run rnaseq \
    --input ~/data/samplesheet.csv \
    --outdir lesson-1.4 \

Writing the run command: reference data

With the inputs and outputs defined, we next need to tell the pipeline where to find the necessary reference data. We have already determined the parameters and files we need to pass to the pipeline, so let's add them to the command now.

Exercise 1.4.2.2 5 mins

Continue writing your run command by passing the reference files to their respective parameters.

Solution

Following on from the last line from Exercise 1.4.2.1, add the --fasta, --gtf, --star_index, and --salmon_index parameters, and pass them the files we determined above in Reference data:

nextflow run rnaseq \
    --input ~/data/samplesheet.csv \
    --outdir lesson-1.4 \
    --fasta ~/data/mm10_reference/mm10_chr18.fa \
    --gtf ~/data/mm10_reference/mm10_chr18.gtf \
    --star_index ~/data/mm10_reference/STAR \
    --salmon_index ~/data/mm10_reference/salmon-index \

Optional parameters

Now that we have prepared our input and reference data, we have defined all the required parameters for the pipeline. However, Nextflow still needs to be configured to use Singularity, and we will add an additional workflow parameter to help speed up the pipeline run for the sake of this workshop. The parameters we will use are:

-profile singularity
- Recall that this is a Nextflow parameter and tell it to use nf-core's Singularity profile, rather than the default Docker profile, and run each process using Singularity containers.
--skip_markduplicates true
- This is a pipeline parameter that tells the rnaseq pipeline to skip duplicate read marking. Ordinarily we would want to include this, but for the sake of the workshop and in the interest of time we will skip it.

Exercise 1.4.2.3 3 mins

Add the optional parameters and the singularity profile to the run command.

Solution

Finish writing the run command by adding the -profile and --skip_markduplicates parameters:

nextflow run rnaseq \
    --input ~/data/samplesheet.csv \
    --outdir lesson-1.4 \
    --fasta ~/data/mm10_reference/mm10_chr18.fa \
    --gtf ~/data/mm10_reference/mm10_chr18.gtf \
    --star_index ~/data/mm10_reference/STAR \
    --salmon_index ~/data/mm10_reference/salmon-index \
    -profile singularity \
    --skip_markduplicates true

Remember that -profile is a Nextflow parameter and therefore only uses a single hyphen. The remaining parameters are workflow parameters and use a double hyphen.

Note also that we have left off the trailing space and bashslash from the final line (--skip_markduplicates true) since this line concludes our initial run command.

What if the parameter I want to apply isn't available?

Recall from the previous lesson that nf-core modules use ext.args to pass additional arguments to a module. This uses a special Nextflow directive ext. If an nf-core pipeline does not have a pre-defined parameter for a process, you may be able to implement ext.args; we'll see this in action in the next session.

The inclusion of ext.args is currently best practice for all DSL2 nf-core modules where additional parameters may be required to run a process. However, this may not be implemented for all modules in all nf-core pipelines. Depending on the pipeline, these process modules may not have defined the ext.args variable in the script blocks and is thus not available for applying customisation. If that is the case consider submitting a feature request or a making pull request on the pipeline's GitHub repository to implement this!

1.4.3 Run the pipeline

You should now have a multi-line command in your terminal waiting to run. Now if you hit Enter, Nextflow should launch and the pipeline will start to run. It will take a few seconds to start up, and then you should start seeing processes spawning and running.

However, very quickly, we run into an error!

What happened?

1.4.4 Setting resource limits

It turns out that there is one thing left to do in order to run the pipeline: set some resource limits. The nf-core/rnaseq pipeline is designed to run on large datasets and therefore expects to require lots of CPU and memory resources to run. However, we're using a small test dataset that doesn't need a lot of computing power, and as such we're also using low-resource VMs. Running the workflow with its default settings causes some of the processes to crash due to insufficient CPU and memory requirements.

We can fix this by telling Nextflow that we want to limit the resource requests from each process to an upper bound of 2 CPUs and 6 GB of memory. We do this within a custom configuration file using the process.resourceLimits directive. This takes a list of upper resource limits like so:

process.resourceLimits = [
    cpus: 2,
    memory: 6.GB,
    time: 1.h
]

Exercise 1.4.4 5 mins

Create a configuration file called nectar_vm.config within your current working directory (~/session2) and add the resourceLimits directive, giving our workflow a limit of 2 CPUs and 6GB of memory.

Solution

First, create the nectar_vm.config file:

touch nectar_vm.config
code nectar_vm.config

Next, add the resourceLimits directive. You can do this in one of two ways. You can use the process.resourceLimits form as shown above:

nectar_vm.config

process.resourceLimits = [
    cpus: 2,
    memory: 6.GB
]

Alternatively, you can use the expanded version by nesting resourceLimits within a process scope:

nectar_vm.config

process {
    resourceLimits = [
        cpus: 2,
        memory: 6.GB
    ]
}

The second form is preferable since we will need the process scope for configuring processes further in the second session.

We now have a finished initial run command. Now we just need to update our run command to include the new configuration file, as well as tell Nextflow to resume from where it left off - there's no sense re-running jobs that already succeeded!

Our final run command and default config file look like:

nextflow run rnaseq \
    --input ~/data/samplesheet.csv \
    --outdir lesson-1.4 \
    --fasta ~/data/mm10_reference/mm10_chr18.fa \
    --gtf ~/data/mm10_reference/mm10_chr18.gtf \
    --star_index ~/data/mm10_reference/STAR \
    --salmon_index ~/data/mm10_reference/salmon-index \
    -profile singularity \
    --skip_markduplicates true \
    -c nectar_vm.config \
    -resume

nectar_vm.config

process {
    resourceLimits = [
        cpus: 2,
        memory: 6.GB
    ]
}

Go ahead and re-run the workflow. It should now run successfully to completion!

1.4.5 Examine the outputs

Take a look at the stdout printed to the screen. Your workflow configuration and parameter customisations are all documented here. You can use this to confirm if your parameters have been correctly passed to the run command:

As the workflow starts, you will also see a number of processes spawn out underneath this. Recall from earlier today that processes are executed independently and can run in parallel. Nextflow manages the data dependencies between processes, ensuring that each process is executed only when its input data is available and all of its dependencies have been satisfied.

To understand how this is coordinated, consider the STAR_ALIGN process that is being run.

You'll notice a few things:

We can see which inputs are being processed by looking at the text within parentheses at the end of the process name (e.g. ...:GTF2BED (mm10_chr18.gtf))
When a process starts it progressively spawns tasks for all inputs to be processed
A separate TRIMGALORE process is run for each of samples in our samplesheet.csv before STAR_ALIGN begins
Once a TRIMGALORE task is completed for a sample, the STAR_ALIGN task for that sample begins
When the STAR_ALIGN process starts, it spawns 2 tasks

Once your pipeline has completed, you should see this message printed to your terminal:

Output

-[nf-core/rnaseq] Pipeline completed successfully -
-[nf-core/rnaseq] Please check MultiQC report: 2/2 samples failed strandedness check.-
Completed at: 20-Apr-2026 01:33:59
Duration    : 7m 6s
CPU hours   : 0.2
Succeeded   : 70

The pipeline ran successfully, however, note the warning about all samples having failed the strandedness check. We'll explore that in the next session.

In the meantime, list the contents of your directory. You will see a few new directories (and a hidden directory and log file) have been created.

List the working directory

ls -lha

Output

total 184K
drwxrwxr-x  6 tdev01 tdev01 4.0K Apr 20 01:27 .
drwxr-x--- 16 tdev01 tdev01 4.0K Apr 20 01:26 ..
drwxrwxr-x  8 tdev01 tdev01 4.0K Apr 19 23:51 lesson-1.4
drwxrwxr-x  4 tdev01 tdev01 4.0K Apr 20 01:34 .nextflow
-rw-rw-r--  1 tdev01 tdev01   79 Apr 17 03:43 nectar_vm.config
-rw-rw-r--  1 tdev01 tdev01 150K Apr 20 01:34 .nextflow.log
drwxrwxr-x  4 tdev01 tdev01 4.0K Apr 17 06:55 rnaseq
drwxrwxr-x 66 tdev01 tdev01 4.0K Apr 20 01:32 work

Nextflow has created several new directories and files in the current directory.

The `work` directory

As each job is run, a unique sub-directory is created in the work directory. These directories house temporary files and various command logs created by a process. We can find all information regarding this process that we need to troubleshoot a failed process.

The `lesson-1.4` directory

All final outputs will be presented in a directory specified by the --outdir flag.

The `.nextflow` directory

This directory contains a cache subdirectory to store cached data such as downloaded files and can be used to speed up subsequent pipeline runs. It also contains a history file which contains a record of pipeline executions including run time, the unique run name, and command line arguments used.

The `.nextflow.log` file

This file is created by Nextflow during the execution of a pipeline and contains information about all processes and any warnings or errors that occurred during execution.

Exercise 1.4.5 5 mins

Was the runtime for the STAR_ALIGN process comparable for samples SRR3473988 and SRR3473989?

Hint: Remember we can use the Nextflow log and trace fields to find this information.

Solution

Run the following:

nextflow log <run_name> -f name,realtime | grep "STAR_ALIGN"

Read alignment was comparable for both samples:

Output

NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (SRR3473988) 1m 16s
NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (SRR3473989) 53.8s

Key points

nf-core pipelines are provided with sensible default settings and have a combination of required and optional inputs and parameters
Each nf-core workflow has a Usage, Output, and Parameters documentation page that should be consulted to help you design a suitable run command for your analysis
Parameters can be used to customise the workflow

1.4 Introduction to nf-core/rnaseq

1.4.1 An introduction to nf-core/rnaseq

Create a fresh working directory

Download the pipeline code

1.4.2 Design your run command

Pipeline structure

Default pipeline usage

Our dataset

Reference data

Writing the run command: required --input and --outdir parameters

Writing the run command: reference data

Optional parameters

1.4.3 Run the pipeline

1.4.4 Setting resource limits

1.4.5 Examine the outputs

The work directory

The lesson-1.4 directory

The .nextflow directory

The .nextflow.log file

Writing the run command: required `--input` and `--outdir` parameters

The `work` directory

The `lesson-1.4` directory

The `.nextflow` directory

The `.nextflow.log` file