Customising the workflow
Questions
- What are the essential nf-core/rnaseq parameters?
- What parameters can be customised?
The nf-core/rnaseq pipeline can be run using a single command with deafult parameters. The default parameters are explained in detail here, and include:
- Either an input sample sheet (
--input
) or directory housing fastq files (--reads
)
- A reference genome, either one available through Illumina’s iGenomes database (
--genome
), or a user-specified reference assembly (--fasta
) and annotation file (--gtf
) - A configuration profile suitable for the computing environment you’re working on.
For example, a user working with human samples, who has Singularity installed and all fastq files stored in a directory would be able to run this command to run the nf-core/rnaseq pipeline:
nextflow run nf-core/rnaseq \
--reads '*_R{1,2}.fastq.gz' \ # The location of fastq files
--genome GRCh38 \ # Illumina iGenomes database -profile singularity # For pre-installed software
Most of us will need to customise the command a little more than this though. For example, a user working with multiple samples, who wants to provide their own pre-indexed reference data, and has computing resource limitations might run a command that looks more like this:
nextflow run $Path_to_nf-core/rnaseq \
--input $Path_to_samplesheet.csv \ # Samplesheet file-name
-profile singularity \ # For pre-installed software
--fasta $Genome_fasta_file \ # Genome sequence file
--gtf $Path_to_Genome.gtf \ # GTF - gene locations on genome
--star_index $Path_to_index file \ # Formatted file for aligner
--max_memory '6 GB' --max_cpus 2 \ # Memory and cpu resources
--outdir $Path_to_results \ # Results folder
-with-report excecution_report.html \ # Excecution log file-name -with-timeline timeline_report.html # Timeline log file-name
Some useful customisation options
Input and output options
--input Path to comma-separated file containing information about the samples. --outdir The output directory where the results will be saved.
Reference genome options
--genome Name of iGenomes reference.
--star_index Path to directory or tar.gz archive for pre-built STAR index.
--hisat2_index Path to directory or tar.gz archive for pre-built HISAT2 index. --save_reference If generated by the pipeline save the STAR index in the results directory.
Alignment options
--aligner Alignment algorithm to use. --pseudo_aligner Pseudo aligner to use.
Process skipping/use-alternate options
--deseq2_vst Use vst transformation instead of rlog with DESeq2.
--skip_fastqc Skip FastQC. --skip_multiqc Skip MultiQC.
For details of all parameters in nf-core take a look here.
Proceed to the next lesson by clicking on What is nf-core/rnaseq doing? > RNA-seq workflow overview on the menu bar.
Key points
- A single nf-core command can be run for the complete pipeline.
- Many different parameters can be used to customise pipeline runs.
- nf-core allows users to skip non-madatory steps in the pipeline.
All materials copyright Sydney Informatics Hub, University of Sydney