2.4 Combining channels and multiple process outputs
Learning objectives
- Implement a channel that combines the contents of two channels.
- Implement a process with multiple output files.
In this step we will transform the 03_multiqc.sh into a process called MULTIQC. 
This step focuses on the final step of our RNAseq data processing workflow: generating
a report that summarises the quality control and quantification steps. 
To do this, we will run MultiQC, which is a popular tool for summarising the outputs of many different bioinformatics tools. It aggregates results from all our analyses and renders it into a nice report.
From the MultiQC docs
MultiQC doesn’t do any analysis for you - it just finds results from other tools that you have already run and generates nice reports. See here for a list of supported tools. You can also see an example report here.
 
Open the bash script 03_multiqc.sh.  
This script is a lot simpler than previous scripts we've worked with. It searches searches for the output files generated by the FASTQC and QUANTIFICATION processes saved to the results/ directory. As specified by --outdir results/, it will output two MultiQC files:  
- A directory called multiqc_data/
- A report file called multiqc_report.html
2.4.1 Building the process
1. Process directives, script, and input
Here is the process template with the container and publishDir
directives provided. Add this to your main.nf after the QUANTIFICATION process:  
process MULTIQC {
  container "quay.io/biocontainers/multiqc:1.19--pyhdfd78af_0"
  publishDir "results", mode: 'copy'
  input:
  path "*"
  output:
    < process outputs >
  script:
  """
  multiqc .
  """
}
The script and input follow the MultiQC Nextflow
integration recommendations. 
The key thing to note here is that MultiQC needs to be run once for all
upstream outputs. 
From the information above we know that the input for multiqc is the 
results/ directory, specifically, the files and directories within
results/. We will need to bring the outputs of the FASTQC
(fastqc_gut_logs/) and QUANTIFICATION (gut/) processes into a single
channel as input to MULTIQC.  
Why you should NOT use the publishDir folder as a process input
It might make sense to have the results/ folder (set by publishDir) as
the input to the process here, but it may not exist until the workflow
finishes. 
Using the publishDir as a process input can cause downstream processes 
prematurely, even if the directory is empty or incomplete. In this case, 
MultiQC might miss some inputs.
Use channels to pass data between processes. Channels enable Nextflow to track outputs and ensure that downstream processes only run when all required data is ready, maintaining proper worfklow control.
More on this in the next section.
2. Define the process output
The MultiQC output consists of the following:
- An HTML report file called multiqc_report.html
- A directory called multiqc_data/containing the data used to generate the report.
Add the following output definition to the MULTIQC process:  
process MULTIQC {
  container "quay.io/biocontainers/multiqc:1.19--pyhdfd78af_0"
  publishDir params.outdir, mode: 'copy'
  input:
  path "*"  
  output:
  path "multiqc_report.html"
  path "multiqc_data"
  script:
  """
  multiqc .
  """
}
2.4.2 Combining channels with operators
Tip
When running MultiQC, it needs to be run once on all the upstream input files. This is so a single report is generated with all the results.
In this case, the input files for the MULTIQC process are outputs from
FASTQC and QUANTIFICATION processes. Both FastQC and Salmon are supported
by MultiQC and the required files are detected automatically by the program
(when using it a Nextflow pipeline, there is some pre-processing that needs to
be done).
The goal of this step is to bring the outputs from MULTIQC and
QUANTIFICATION processes into a single input channel for the MULTIQC
process. This ensures that MultiQC is run once.  
The next few additions will involve chaining together Nextflow operators to
correctly format inputs for the MULTIQC process.  
Poll
What Nextflow input type (qualifier) ensures that inputs are grouped and processed together?
Add the following to the workflow block in your main.nf file, under the
QUANTIFICATION process.  
// Define the workflow
workflow {
    // Run the index step with the transcriptome parameter
    INDEX(params.transcriptome_file)
    // Define the fastqc input channel
    reads_in = Channel.fromPath(params.reads)
        .splitCsv(header: true)
        .map { row -> [row.sample, file(row.fastq_1), file(row.fastq_2)] }
    // Run the fastqc step with the reads_in channel
    FASTQC(reads_in)
    // Define the quantification channel for the index files
    transcriptome_index_in = INDEX.out[0]
    // Run the quantification step with the index and reads_in channels
    QUANTIFICATION(transcriptome_index_in, reads_in)
    // Define the multiqc input channel
    FASTQC.out[0]
        .mix(QUANTIFICATION.out[0])
        .view()
}
This channel creates a tuple with the two inputs as elements:
- Takes the output of FASTQC, using element[0]to refer to the first element of the output.
- Uses mix(QUANTIFICATION.out[0])to combineFASTQC.out[0]output with the first element of theQUANTIFICATIONoutput.
- Uses view()allows us to see the values emitted into the channel.
For more information, see the documentation on
mix.
Run the workflow to see what it produces:
The output should look something like:
Launching `main.nf` [stupefied_minsky] DSL2 - revision: 82245ce02b
[de/fef8c4] INDEX              | 1 of 1, cached: 1 ✔
[bb/32a3aa] FASTQC (1)         | 1 of 1, cached: 1 ✔
[a9/000f36] QUANTIFICATION (1) | 1 of 1, cached: 1 ✔
/home/user1/part2/work/bb/32a3aaa5e5fd68265f0f34df1c87a5/fastqc_gut_logs
/home/user1/part2/work/a9/000f3673536d98c8227b393a641871/gut
The outputs have been emitted one after the other, meaning that it will be processed separately. We need them to be processed together (generated in the same MultiQC report), so we need to add one more step.
Note
Note that the outputs point to the files in the work directories, rather than
the publishDir. This is one of the ways that Nextflow ensures all input files
are ready and ensures proper workflow control.
Add the collect
operator to ensure all samples are processed together in the same
process and view the output:  
// Define the workflow
workflow {
    // Run the index step with the transcriptome parameter
    INDEX(params.transcriptome_file)
    // Define the fastqc input channel
    reads_in = Channel.fromPath(params.reads)
        .splitCsv(header: true)
        .map { row -> [row.sample, file(row.fastq_1), file(row.fastq_2)] }
    // Run the fastqc step with the reads_in channel
    FASTQC(reads_in)
    // Define the quantification channel for the index files
    transcriptome_index_in = INDEX.out[0]
    // Run the quantification step with the index and reads_in channels
    QUANTIFICATION(transcriptome_index_in, reads_in)
    // Define the multiqc input channel
    FASTQC.out[0]
        .mix(QUANTIFICATION.out[0])
        .collect()
        .view()
}
Run the workflow:
The channel now outputs a single tuple with the two directories:
Launching `main.nf` [small_austin] DSL2 - revision: 6ab927f137
[de/fef8c4] INDEX              | 1 of 1, cached: 1 ✔
[bb/32a3aa] FASTQC (1)         | 1 of 1, cached: 1 ✔
[a9/000f36] QUANTIFICATION (1) | 1 of 1, cached: 1 ✔
[/home/user1/part2/work/bb/32a3aaa5e5fd68265f0f34df1c87a5/fastqc_gut_logs, /home/user1/part2/work/a9/000f3673536d98c8227b393a641871/gut]
Now that we have a channel that emits the correct data, add the finishing touches to the workflow scope.
Exercise: Assign the input channel
- Assign the chain of operations to a channel called multiqc_in
- Remove the .view()operator
Solution
    // Define the quantification channel for the index files
    transcriptome_index_in = INDEX.out[0]
    // Run the quantification step with the index and reads_in channels
    QUANTIFICATION(transcriptome_index_in, reads_in)
    // Define the multiqc input channel
    multiqc_in = FASTQC.out[0]
        .mix(QUANTIFICATION.out[0])
        .collect()
}
Exercise: call the MULTIQC process
- Add the MULTIQCprocess in the workflow scope
- Pass the multiqc_inchannel as input.
Solution
    // Define the quantification channel for the index files
    transcriptome_index_in = INDEX.out[0]
    // Run the quantification step with the index and reads_in channels
    QUANTIFICATION(transcriptome_index_in, reads_in)
    // Define the multiqc input channel
    multiqc_in = FASTQC.out[0]
        .mix(QUANTIFICATION.out[0])
        .collect()
    // Run the multiqc step with the multiqc_in channel
     MULTIQC(multiqc_in)
}
Run the workflow:
Your output should look something like:
Launching `main.nf` [hopeful_swanson] DSL2 - revision: a4304bbe73
[aa/3b8821] INDEX          [100%] 1 of 1, cached: 1 ✔
[c2/baa069] QUANTIFICATION [100%] 1 of 1, cached: 1 ✔
[ad/e49b20] FASTQC         [100%] 1 of 1, cached: 1 ✔
[a3/1f885c] MULTIQC        [100%] 1 of 1 ✔
2.4.3 Inspecting the MultiQC report
Let's inspect the generated MultiQC report. You will need to download the file to your local machine and open it in a web browser.
Exercise
- In the VSCode Explorer sidebar, locate the report results/multiqc_report.html
- Right click on the file, and select "Download"
- Open the file in a web browser
Poll
Under the "General Statistics" section, how many samples (i.e. rows) have been included in the table?
Tip
If you have to view many (i.e. .html) files on a remote server, we recommend using the 
Live Server
VSCode extension. 
The extension allows you to view .html files within a VSCode tab instead
of manually downloading files locally.
You have a working pipeline for a single paired-end sample!
Summary
In this lesson you have learned:
- How to implement a process following integration recommendations
- How to define an output with multiple outputs
- How to use the mixandcollectoperators to combine outputs into a single tuple
- How to access and view .htmlfiles from a remote server