Your first pipeline
Learning objectives
- Write your first Nextflow pipeline
- Execute your first Nextflow pipeline and understand the outputs
Workflow languages are better than Bash scripts because they handle errors and run tasks in parallel more easily, which is important for complex jobs. They also have clearer structure, making it easier to maintain and work on with others.
Here, you're going learn more about the Nextflow language and take your first steps making a your first pipeline with Nextflow.
hello-world.nf
Nextflow pipelines need to be saved as .nf
files.
The process definition starts with the keyword process
, followed by process name, and finally the process body delimited by curly braces. The process body must contain a script
block which represents the command or, more generally, a script that is executed by it.
A process may contain any of the following definition blocks: directives
, inputs
, outputs
, when
clauses, and of course, script
.
process < name > {
[ directives ]
input:
< process inputs >
output:
< process outputs >
when:
< condition >
script:
"""
<script to be executed>
"""
}
A workflow is a composition of processes and dataflow logic.
The workflow definition starts with the keyword workflow
, followed by an optional name, and finally the workflow body delimited by curly braces.
Let's review the structure of hello-world.nf
, a toy example you will be executing and developing:
hello-world.nf | |
---|---|
The first piece of code (lines 1-11) describes a process called SAYHELLO
with three definition blocks:
- debug: a directive that, when set to true, will print the output to the console
- output: directing outputs to be printed to
stdout
(standard output) - script: the
echo 'Hello World!'
command
The second block of code (13-15) lines describes the workflow itself, which consists of one call to the SAYHELLO
process.
Note
Using debug true
and stdout
in combination will cause 'Hello World!' to be printed to the terminal.
Commenting your code
It is worthwhile to comment your code so we, and others, can easily understand what the code is doing (you will thank yourself later).
In Nextflow, a single line comment can be added by prepending it with two forward slash (//
):
Similarly, multi-line comments can be added using the following format:
As a developer you can to choose how and where to comment your code.
Exercise
Add a comment to the pipeline to describe what the process block is doing
Executing hello-world.nf
The nextflow run
command is used to execute Nextflow pipelines:
When a pipeline is stored locally you need to supply the full path to the script. However, if the pipeline has been submitted to GitHub (and you have an internet connection) you can execute it without a local copy. For example, the hello repository hosted on the nextflow-io GitHub account can be executed using:
Exercise
Use the nextflow run
command to execute hello-world.nf
Yay! You have just run your first pipeline!
Your console should look something like this:
What does each line mean?
- The version of Nextflow that was executed
- The script and version names
- The executor used (in the above case: local)
- The first process is executed once, which means there is one task. The line starts with a unique hexadecimal value, and ends with the task completion information
- The result string from stdout is printed
Task directories
When a Nextflow pipeline is executed, a work
directory is created. Processes are executed in isolated task directories. Each task uses a unique directory based on its hash (e.g., 4e/6ba912
) within the work directory.
When a task is created, Nextflow stages the task input files, script, and other helper files into the task directory. The task writes any output files to this directory during its execution, and Nextflow uses these output files for downstream tasks and/or publishing.
These directories do not share a writable state, and any required files or information must be passed through channels (this will be important later).
Note
You can execute tree work
to view the work directory structure.
Warning
The work directory might not have the same hash as the one shown above.
A series of files log files and any outputs are created by each task in the work directory:
.command.begin
: Metadata related to the beginning of the execution of the process task.command.err
: Error messages (stderr) emitted by the process task.command.log
: Complete log output emitted by the process task.command.out
: Regular output (stdout
) by the process task.command.sh
: The command that was run by the process task call.exitcode
: The exit code resulting from the command
These files are created by Nextflow to manage the execution of your pipeline. While these file are not required now, you may need to interrogate them to troubleshoot issues later.
Exercise
Browse the work
directory and view the .command.sh
file
Summary
In this step you have learned:
- How to create a Nextflow pipeline
- How to interpret
hello-world.nf
- How to add comments to your pipelines
- How to
run
a Nextflow pipeline - How to view log files create by Nextflow