nf-test: Example assertions
This document details various assertions used in nf-test for testing Nextflow pipelines. It serves as a guide for implementing effective testing strategies in pipeline development. For more information on nf-test, see the nf-test documentation.
Snapshots
Snapshots are used to compare the current output of a process, workflow, or function against a reference snapshot file (*.nf.test.snap
).
Using Snapshots
Create snapshots using the snapshot
keyword. The match
method checks if the snapshot corresponds to the expected data in the snap file. For example:
The first test run generates a json snapshot file. Subsequent runs compare against this file. Commit snapshot files with code changes and review them in your code review process.
Assigning parameters or configs
nf-test allows to specify params or including config files.
Use withName selectors to assign ext.args
values to a specific process.
Both these directives work within the scope they are defined in.
So either for the full set of test within the main.nf.test
file if written in the main nextflow_process
, nextflow_workflow
or nextflow_pipeline
scope, or for a single test if written within the test
scope.
File Paths
nf-test replaces paths in snapshots with a unique fingerprint (md5 sum by default) to ensure file content consistency.
Asserting the Presence of an Item in the Channel using contains
Groovy’s contains
and collect
methods assert the presence of items in channel output.
Indexing
You can access elements in output channels using index notation, for example:
which is equivalent to
Additional Reading
nf-core guidelines for assertions
- Encapsulate Assertions in
assertAll()
: Group all assertions withinassertAll()
for comprehensive testing. - Minimum Requirement - Process Success + version.yml file: Always check if the process completes successfully and make at least a snapshot of the version.yml.
- Capture as much as possible: Best case scenario: make a snapshots to verify the complete output of your process. The absolute minimum is to check that the output file exists, but try to check also for substrings, number of lines or similar.
process.out
will capture all output channels, both named and index based ones.
Additional cases:
-
Handling Inconsistent md5sum: Use specific content checks for elements with inconsistent md5sums.
-
Module/Process Truth Verification: Ensure snapshots accurately reflect the module/process functionality.
Different Types of Assertions
Simple & Straight-Forward
Snapshot Entire Output Channel
Motivation: Make sure all outputs are stable over changes.
Explanation: Verifies process completion and output against a snapshot.
Complex - Handling Inconsistent md5sum in Output Elements
Snapshot a Specific Element in Output Channel
Motivation: Create the snapshot for one specific output.
Explanation: Checks a specific element, in this case versions
, in the output channel of a process against a predefined snapshot named “versions”.
File Exists Check
Motivation: Snapshots of an output are unstable, i.e. they change between test runs, for example because they include a timestamp/file-path in the content.
Explanation: Verifies the existence of a specific file, IndexMetricsOut.bin
, in the output of a process.
Snapshot Sorted List & Exclude a Specific File
Motivation: I want to create a snapshot of different outputs, including several log files. I can’t snapshot the whole output, because one file is changing between test runs.
Explanation: This creates a snapshot for all output files and of a sorted list from a log directory while excluding a specific file, IndexMetricsOut.bin
, in the comparison. The existence of this excluded file is checked in the end.
File Contains Check
Explanation: This checks if the last line of a report file contains a specific string and if the file name ends with “hisat2_SE_report.txt”.
Snapshot Selective Portion of a File
Motivation: We can’t make a snapshot of the whole file, because they are not stable, but we know a portion of the content should be stable, e.g. the timestamp is added in the 6th line, so we want to only snapshot the content of the first 5 lines.
Explanation: Creates a snapshot of a specific portion (first five lines) of a file for comparison.
Snapshot Selective Portion of a File & number of lines
Motivation: We can’t make a snapshot of the whole file, because they are not stable, but we know a portion of the content should be stable and the number of lines in it as well.
Explanation: Verifies the content of the first six lines of a gzipped file, and the total number of lines in the file.
ReadLines & Contains
Motivation: We can’t make a snapshot of the complete file, but we want to make sure that a specific substring is always present.
Explanation: Checks if specific strings, /LIBS/GUID
and /libs/cloud/report_instance_identity
exist within the lines of an output file.
Snapshot an Element in Tuple Output
Motivation: We can’t snapshot the whole tuple, but on element of the tuple has stable snapshots.
Explanation: Validates an element within a tuple output against a snapshot.
Snapshot Published File in Outdir
Motivation: I want to check a specific file in the output is saved correctly and is stable between tests.
Explanation: Confirms that a file saved in the specified output directory matches the expected snapshot.
Assert File Name and Type
Motivation: I don’t know the exact location, know that at least the file type is fixed.
Explanation: Ensures that a file from the output matches a specific pattern, indicating its type and name.
Snapshot Selective File Names & Content
Motivation: I want to include in the snapshot:
- the names of the files in
npa
&npc
output channels - The first line of the file in
npo
out channel - The md5sum of the file in
npl
out channel
Explanation: Compares specific filenames and content of multiple files in a process output against predefined snapshots.
Snapshot the Last 4 Lines of a Gzipped File in the gzip output channel
Explanation: Retrieves and allows the inspection of the last four lines of a gzipped file from the output channel.
Assert a contains check in a gzipped file
Motivation: I want to check the presence of a specific string or data pattern within a gzipped file
Explanation: check if a specific string ("MT192765.1\t10214\t.\tATTTAC\tATTAC\t29.8242"
) is present in the content of a gzipped file, specified by path(process.out.vcf[0][1]).linesGzip.toString()
.
Useful nf-test operators and functions
Regular Expressions
The operator ==~
can be used to check if a string matches a regular expression:
Using with()
Instead of writing:
You can reduce redundancy using the with()
command:
Known Issues
When using nf-test in conjunction with container technologies like Docker, Singularity, or Conda, it’s crucial to be aware of environment-specific issues that can arise, particularly regarding mismatched hashes. Here are some tips to handle such scenarios effectively:
Tips for Handling Mismatched Hashes in Docker/Singularity/Conda
-
Check for Consistent Environment Across Containers:
Ensure that the environment inside your Docker, Singularity, or Conda containers is consistent. Differences in installed packages, software versions, or underlying operating systems can lead to mismatched hashes.
-
Use Identical Base Images:
When building Docker or Singularity containers, start from the same base image to minimize environmental differences. This consistency helps ensure that the software behaves the same across different executions.
-
Pin Software Versions:
In your container definitions (Dockerfile, Singularity recipe, Conda environment file), explicitly pin software versions, including dependencies. This step reduces the chances of discrepancies due to updates or changes in the software.
-
Isolate Non-Deterministic Elements:
Identify elements in your workflow that are inherently non-deterministic (such as timestamps or random number generation) and isolate them. Consider mocking these elements or designing your tests to accommodate such variability.
-
Reproducibility in Conda Environments: For Conda environments, use
conda list --explicit
to generate a list of all packages with their exact versions and builds. This approach ensures that you can recreate the identical environment later. -
Review Container Caching Mechanisms:
Be cautious with container caching mechanisms. Sometimes, cached layers in Docker might lead to using outdated versions of software or dependencies. Ensure that your caching strategy does not inadvertently introduce inconsistencies.
-
Consistent Filesystem Paths:
Ensure that paths within the container and in the testing environment are consistent. Variations in paths can sometimes lead to unexpected behavior and hash mismatches.
-
Regularly Update and Test: Regularly update your containers and environment specifications, and re-run tests to ensure that everything continues to work as expected. This practice helps identify and resolve issues arising from environmental changes over time.
By following these tips, you can mitigate the risks of encountering mismatched hashes due to environment-specific issues in Docker, Singularity, and Conda when using nf-test for your Nextflow pipelines.