Customising the workflow
Questions
- What are the essential nf-core/rnaseq parameters?
- What parameters can be customised?
The nf-core/rnaseq pipeline can be run using a single command with deafult parameters. The default parameters are explained in detail here, and include:
- Either an input sample sheet (
--input) or directory housing fastq files (--reads)
- A reference genome, either one available through Illumina’s iGenomes database (
--genome), or a user-specified reference assembly (--fasta) and annotation file (--gtf) - A configuration profile suitable for the computing environment you’re working on.
For example, a user working with human samples, who has Singularity installed and all fastq files stored in a directory would be able to run this command to run the nf-core/rnaseq pipeline:
nextflow run nf-core/rnaseq \
--reads '*_R{1,2}.fastq.gz' \ # The location of fastq files
--genome GRCh38 \ # Illumina iGenomes database
-profile singularity # For pre-installed softwareMost of us will need to customise the command a little more than this though. For example, a user working with multiple samples, who wants to provide their own pre-indexed reference data, and has computing resource limitations might run a command that looks more like this:
nextflow run $Path_to_nf-core/rnaseq \
--input $Path_to_samplesheet.csv \ # Samplesheet file-name
-profile singularity \ # For pre-installed software
--fasta $Genome_fasta_file \ # Genome sequence file
--gtf $Path_to_Genome.gtf \ # GTF - gene locations on genome
--star_index $Path_to_index file \ # Formatted file for aligner
--max_memory '6 GB' --max_cpus 2 \ # Memory and cpu resources
--outdir $Path_to_results \ # Results folder
-with-report excecution_report.html \ # Excecution log file-name
-with-timeline timeline_report.html # Timeline log file-nameSome useful customisation options
Input and output options
--input Path to comma-separated file containing information about the samples.
--outdir The output directory where the results will be saved.Reference genome options
--genome Name of iGenomes reference.
--star_index Path to directory or tar.gz archive for pre-built STAR index.
--hisat2_index Path to directory or tar.gz archive for pre-built HISAT2 index.
--save_reference If generated by the pipeline save the STAR index in the results directory.Alignment options
--aligner Alignment algorithm to use.
--pseudo_aligner Pseudo aligner to use.Process skipping/use-alternate options
--deseq2_vst Use vst transformation instead of rlog with DESeq2.
--skip_fastqc Skip FastQC.
--skip_multiqc Skip MultiQC.For details of all parameters in nf-core take a look here.
Proceed to the next lesson by clicking on What is nf-core/rnaseq doing? > RNA-seq workflow overview on the menu bar.
Key points
- A single nf-core command can be run for the complete pipeline.
- Many different parameters can be used to customise pipeline runs.
- nf-core allows users to skip non-madatory steps in the pipeline.
All materials copyright Sydney Informatics Hub, University of Sydney