1.7 Adding new processes

Learning objectives

Revise the use of piping and text transformation in bash (tr, cat, |)
Implement a new process block to perform transformations on text files
Implement connecting the output of a process as an input of another process via channels
Evaluate process outputs and how they interact in a multi-step pipeline

Up until now you've been modifying a single step. However, pipelines generally consist of multiple steps where outputs from one step are used as inputs for the next.

Here you're going to step things up again and add another process to the pipeline.

1.7.1 Translating text

The tr command is a UNIX command-line utility for translating or deleting characters. It supports a range of transformations including uppercase to lowercase, squeezing repeating characters, deleting specific characters, and basic find and replace. It can be used with UNIX pipes to support more complex translation. tr stands for translate. The following example will translate all lower case letters (represented by the pattern [a-z]) to upper case (represented by [A-Z]):

tr '[a-z]' '[A-Z]'

Advanced content: regular expressions

For the curious, the patterns we are using here - e.g. [a-z] - are called regular expressions. They are a way of describing patterns in text and can be immensely useful in manipulating text as they provide a way to search and replace text in more complex ways than simple exact matches. Be warned, they can get very complicated and confusing very quickly!

1.7.2 Piping commands

The pipe command in Linux, represented by the vertical bar symbol |, is an essential tool for command-line enthusiasts and professionals alike. The primary purpose of the pipe command is to connect the output of one command directly into the input of another:

echo 'Hello World' | tr '[a-z]' '[A-Z]'

The contents of a file can be piped using the cat command:

cat output.txt | tr '[a-z]' '[A-Z]'

Like before, the output can be redirected to an output file:

cat output.txt | tr '[a-z]' '[A-Z]' > upper.txt

1.7.3 Adding the `CONVERTTOUPPER` process

The output of the SAYHELLO process is a text file called output.txt.

In the next step of the pipeline, you will add a new process named CONVERTTOUPPER that will convert all of the lower case letters in this file to a uppercase letters and save them as a new file.

The CONVERTTOUPPER process will follow the same structure as the SAYHELLO process:

process CONVERTTOUPPER {
    publishDir params.outdir

    input:
    <input qualifier> <input name>

    output:
    <output qualifier> <output name>

    script:
    """
    <script>
    """
}

Using what you have learned in the previous sections you will now write a new process using the tr command from above.

Exercise

Add new process named CONVERTTOUPPER that will take an input text file, convert all of the lowercase letters in the text file to uppercase letters, and save a new text file that contains the translated letters.

The CONVERTOTUPPER process should be defined inbetween your SAYHELLO process and workflow scope.

Hints have been provided to aid you in writing the process. We recommend following the order of the hints, first defining the script block, followed by the output and input.

Hint: script:

The script might look something like this:

cat $input_file | tr '[a-z]' '[A-Z]' > upper.txt

Hint 1: The input name is input_file, however, you may call it something different. Hint 2: The output text file is named upper.txt

Hint: output:

The output

path 'upper.txt'

Hint 1: The output is a file and requires the path qualifier.

Hint 2: The output name is hard coded as 'upper.txt', however, you may call it something different.

Hint: input:

path input_file

Hint 1: The input is a file and requires the path qualifier.

Hint 2: input_file must be the same as what was specified as the input name in the script block.

Solution

hello-world.nf

// Set default greeting
params.greeting = 'Hello World!'

// Set a default output directory
params.outdir = 'results'

// Use echo to print 'Hello World!' and redirect to output.txt
process SAYHELLO {
    publishDir params.outdir

    input:
    val greeting

    output:
    path 'output.txt'

    script:
    """
    echo '$greeting' > output.txt
    """
}

process CONVERTTOUPPER {
    publishDir params.outdir

    input:
    path input_file

    output:
    path 'upper.txt'

    script:
    """
    cat $input_file | tr '[a-z]' '[A-Z]' > upper.txt
    """
}

workflow {

    // Create a channel for inputs
    greeting_ch = Channel.of(params.greeting)

    // Emit a greeting
    SAYHELLO(greeting_ch)

}

1.7.4 Connecting the processes

As we learned in the inputs module, Nextflow uses channels to connect processes. Each output defined in a process' output block defines a new channel that can be used as inputs for another process.

The output channel from a process can be accessed by adding .out to the end of a process name in the workflow definition:

SAYHELLO.out

Outputs can then be used as an input for another process:

CONVERTTOUPPER(SAYHELLO.out)

Alternatively, you can assign the output channel to a new variable name for convenience:

hello_msg = SAYHELLO.out

CONVERTTOUPPER(hello_msg)

The process output behaves like any other channel and can be used as inputs for multiple downstream processes.

Warning

Adding .out to the end of a process name only works for single outputs. If there are multiple outputs, you will need to use an integer index to select the appropriate output (e.g. .out[0] or .out[1] for the first and second inputs, respectively), or (more conveniently) use the emit option when defining the output block of the process, which allows you to select the output by name (e.g. .out.some_output). See the additional options section of the Nextflow documentation for more information.

Exercise

Add the CONVERTTOUPPER process to your workflow definition. Use the output from SAYHELLO as its input.

Solution

hello-world.nf

// Set default greeting
params.greeting = 'Hello World!'

// Set a default output directory
params.outdir = 'results'

// Use echo to print 'Hello World!' and redirect to output.txt
process SAYHELLO {
    publishDir params.outdir

    input:
    val greeting

    output:
    path 'output.txt'

    script:
    """
    echo '$greeting' > output.txt
    """
}

process CONVERTTOUPPER {
    publishDir params.outdir

    input:
        path input_file

    output:
        path 'upper.txt'

    script:
    """
    cat $input_file | tr '[a-z]' '[A-Z]' > upper.txt
    """
}

workflow {

    // Create a channel for inputs
    greeting_ch = Channel.of(params.greeting)

    // Emit a greeting
    SAYHELLO(greeting_ch)

    // Convert the greeting to uppercase
    CONVERTTOUPPER(SAYHELLO.out)

}

Executing hello-world.nf will now show a second step:

N E X T F L O W  ~  version 23.10.1
Launching `hello-world.nf` [mighty_murdock] DSL2 - revision: 80e92a677c
executor >  local (2)
[ef/b99a2f] SAYHELLO (1)       [100%] 1 of 1 ✔
[cd/c8cf1b] CONVERTTOUPPER (1) [100%] 1 of 1 ✔

Summary

In this section we extended our pipeline by introducing a second process, and connecting to the first. We:

Explore the tr command for translating text to uppercase
Defined a new process CONVERTTOUPPER to take a file and generate a modified output
Connected the output of the SAYHELLO process to CONVERTTOUPPER using .out
Updated our workflow scope to call both processes sequentially

1.7 Adding new processes

1.7.1 Translating text

1.7.2 Piping commands

1.7.3 Adding the CONVERTTOUPPER process

1.7.4 Connecting the processes

1.7.3 Adding the `CONVERTTOUPPER` process