nf-core/mag
Assembly and binning of metagenomes
Version history
Added
- #692 - Added Nanoq as optional longread filtering tool (added by @muabnezor)
- #692 - Added chopper as optional longread filtering tool and/or phage lambda removal tool (added by @muabnezor)
- #707 - Make Bin QC a subworkflow (added by @dialvarezs)
- #707 - Added CheckM2 as an alternative bin completeness and QC tool (added by @dialvarezs)
- #708 - Added
--exclude_unbins_from_postbinning
parameter to exclude unbinned contigs from post-binning processes, speeding up Prokka in some cases (added by @dialvarezs) - #732 - Added support for Prokka’s compliance mode with
--prokka_with_compliance --prokka_compliance_centre <xyz>
(reported by @audy and @Thomieh73, added by @jfy133)
Changed
Fixed
- #707 - Fixed channel passed as GUNC input (added by @dialvarezs)
- #724 - Fix quoting in
utils_nfcore_mag_pipeline/main.nf
(added by @dialvarezs) - #716 - Make short read processing a subworkflow (added by @muabnezor)
- #708 - Fixed channel passed as GUNC input (added by @dialvarezs)
- #729 - Fixed misspecified multi-FASTQ input for single-end data in MEGAHIT (reported by John Richards, fix by @jfy133)
Dependencies
Tool | Previous version | New version |
---|---|---|
CheckM | 1.2.1 | 1.2.3 |
CheckM2 | 1.0.2 | |
chopper | 0.9.0 | |
GUNC | 1.0.5 | 1.0.6 |
nanoq | 0.10.0 |
Fixed
Added
- #674 - Added
--longread_adaptertrimming_tool
Where user can chose between porechop_abi (default) and porechop (added by @muabnezor)
Changed
- #674 - Changed to porechop-abi as default adapter trimming tool for long reads. User can still use porechop if preferred (added by @muabnezor)
- #666 - Update SPAdes to version 4.0.0, replace both METASPADES and MEGAHIT with official nf-core modules (requested by @elsherbini, fix by @jfy133)
- #666 - Update URLs to GTDB database downloads due to server move (reported by @Jokendo-collab, fix by @jfy133)
- #695 - Updated to nf-core 3.0.2
TEMPLATE
(by @jfy133) - #695 - Switch more stable Zenodo link for CheckM data (by @jfy133)
Fixed
- #674 - Make longread preprocessing a subworkflow (added by @muabnezor)
- #674 - Add porechop and filtlong logs to multiqc (added by @muabnezor)
- #674 - Change local filtlong module to the official nf-core/filtlong module (added by @muabnezor)
- #690 - MaxBin2 now using the abundance information from different samples rather than an average (reported by @uel3 and fixed by @d4straub)
- #698 - Updated prodigal module to not pick up input symlinks for compression causing pigz errors (reported by @zackhenny, fix by @jfy133 )
Dependencies
Tool | Previous version | New version |
---|---|---|
Porechop_ABI | 0.5.0 | |
Filtlong | 0.2.0 | 0.2.1 |
SPAdes | 3.15.3 | 4.0.0 |
[!CAUTION] This release contains a potentially ‘breaking change’ for some users. The
--gtdbtk_pplacer_scratch
flag has been replaced with--gtdbtk_pplacer_useram
. Check the parameter documentation for more details.
Added
- #665 - Add support for supplying pre-made bowtie host reference index (requested by @simone-pignotti, added by @jfy133)
- #670 - Added
--gtdbtk_pplacer_useram
to run GTDBTk in memory mode rather than write to disk (requested by @harper357, fixed by @jfy133)
Changed
- #664 - Update GTDBTk to latest version, with updated column names, update GTDB to release 220 (by @dialvarezs)
- #676 - Added exit code 12 to valid SPAdes retry codes, due to OOM errors from spades-hammer (reported by @bawee, fix by @jfy133)
Fixed
- #667 - Fix pipeline crashing if only CONCOCT selected during binning (reported and fixed by @jfy133)
- #670 - Re-add missing GTDBTk parameters into GTDBTk module (reported by harper357, fixed by @jfy133)
- #672 - Fix GTDB-Tk per-sample TSV files not being published in output directory (reported by @jhayer, fix by @jfy133)
Dependencies
Tool | Previous version | New version |
---|---|---|
GTDBTk | 2.3.2 | 2.4.0 |
Deprecated
Fixed
- #648 - Fix sample ID/assembly ID check failure when no IDs match (reported by @zackhenny, fix by @prototaxites)
- #646 - GTDB-Tk directory input now creates a value channel so it runs for all entries to the process and not just the first (reported by @amizeranschi, fix by @prototaxites).
- #639 - Fix pipeline failure when a sample produces only a single bin (fix by @d-callan)
- #651 - Replace base container for bash only modules to reduce number of containers in pipeline (reported and fixed by @harper357)
- #652 - Fix documentation typo in using user-defined assembly parameters (reported and fixed by @amizeranschi)
- #653 - Fix overwriting of per-bin ‘raw’ GUNC RUN output files (multi-bin summary tables not affected) (reported by @zackhenny and fixed by @jfy133)
Changed
- #633 - Changed BUSCO to use offline mode when the database is specified by the user (reported by @ChristophKnapp and many others, fix by @jfy133)
- #632 - Use default NanoLyse log of just removed reads rather than custom (by @jfy133)
Fixed
- #630 - Fix CONCOCT empty bins killing the pipeline, and allow for true multithreading again (removing OPENBLAS loop) (reported by @maxibor, fix by @maxibor and @jfy133)
Dependencies
Tool | Previous version | New version |
---|---|---|
Porechop | 0.2.3_seqan2.1.1 | 0.2.4 |
NanoPlot | 1.26.3 | 1.41.6 |
NanoLyse | 1.1.0 | 1.2.0 |
[!CAUTION] This release contains a potentially ‘breaking change’ for some users. The pipeline no longer directly accepts FASTQ files via
--input
. You must use a samplesheet and specify the FASTQs there.
Added
Changed
- #599 - Update to nf-core v2.13.1
TEMPLATE
(by @jfy133) - #614 - Update to nf-core v2.14.1
TEMPLATE
(by @jfy133)
Fixed
- #606 - Prevent pipeline crash when premade mashdb given to or no alignments found with GTDB-TK_CLASSIFYWF (reported by @cedwardson4, fix by @jfy133)
Deprecated
Changed
- #581 - Added explicit licence text to headers of all custom scripts (reported by @FriederikeHanssen and @maxibor, fix by @jfy133)
Fixed
Added
- #562 - Add CAT summary into the global bin_summary (by @maxibor)
- #565 - Add warning of empty GTDB-TK results if no contigs pass completeness filter (by @jfy133 and @maxibor)
Changed
- #563 Update to nf-core v2.12
TEMPLATE
(by @CarsonJM) - #566 - More logical ordering of MultiQC sections (assembly and bin sections go together respectively) (fix by @jfy133)
Fixed
- #548 - Fixes to (reported by @maxibor, @PPpissar, @muniheart, @llborcard, fix by @maxibor)
- GTDBK-TK execution
- CAT/QUAST/DEPTH bin summary file name collisions
- BUSCO database parsing
- Correct CAT name files
- #558 - Fix bug in run merging when dealing with single end data (reported by @roberta-davidson, fix by @jfy133)
Fixed
- #489 - Fix file name collision clashes for CHECKM, CAT, GTDBTK, and QUAST (reported by @tillenglert and @maxibor, fix by @maxibor)
- #533 - Fix glob pattern for publishing MetaBAT2 bins in results (reported by @patriciatran, fix by @jfy133)
- #535 - Fix input validation pattern to again allow direct FASTQ input (reported by @lennijusten, @emnilsson, fix by @jfy133, @d4straub, @mahesh-panchal, @nvnieuwk)
Dependencies
Tool | Previous version | New version |
---|---|---|
CAT | 4.6 | 5.2.3 |
Deprecated
Added
- #504 - New parameters
--busco_db
,--kraken2_db
, and--centrifuge_db
now support directory input of a pre-uncompressed database archive directory (by @gregorysprenger).
Changed
- #511 - Update to nf-core 2.10
TEMPLATE
(by @jfy133) - #504 -
--save_busco_reference
is now replaced by--save_busco_db
(by @gregorysprenger).
Fixed
- #514 - Fix missing CONCOCT files in downstream output (reported by @maxibor, fix by @jfy133)
- #515 - Fix overwriting of GUNC output directories when running with domain classification (reported by @maxibor, fix by @jfy133)
- #516 - Fix edge-case bug where MEGAHIT re-uses previous work directory on resume and fails (reported by @husensofteng, fix by @prototaxites)
- #520 - Fix missing Tiara output files (fix by @jfy133)
- #522 - Fix ‘nulls’ in depth plot PNG files (fix by @jfy133)
Deprecated
- #504 -
--busco_reference
,--busco_download_path
,--save_busco_reference
parameters have been deprecated and replaced with new parameters (by @gregorysprenger).
Added
- #497 - Adds support for pointing at a local db for krona, using the parameter
--krona_db
(by @willros). - #395 - Adds support for fast domain-level classification of bins using Tiara, to allow bins to be separated into eukaryotic and prokaryotic-specific processes.
- #422 - Adds support for normalization of read depth with BBNorm (added by @erikrikarddaniel and @fabianegli)
- #439 - Adds ability to enter the pipeline at the binning stage by providing a CSV of pre-computed assemblies (by @prototaxites)
- #459 - Adds ability to skip damage correction step in the ancient DNA workflow and just run pyDamage (by @jfy133)
- #364 - Adds geNomad nf-core modules for identifying viruses in assemblies (by @PhilPalmer and @CarsonJM)
- #481 - Adds MetaEuk for annotation of eukaryotic MAGs, and MMSeqs2 to enable downloading databases for MetaEuk (by @prototaxites)
- #437 -
--gtdb_db
also now supports directory input of an pre-uncompressed GTDB archive directory (reported by @alneberg, fix by @jfy133) - #494 - Adds support for saving the BAM files from Bowtie2 mapping of input reads back to assembly (fix by @jfy133)
Changed
- #428 #467 - Update to nf-core 2.8, 2.9
TEMPLATE
(by @jfy133) - #429 - Replaced hardcoded CheckM database auto-download URL to a parameter (reported by @erikrikarddaniel, fix by @jfy133)
- #441 - Deactivated CONCOCT in AWS ‘full test’ due to very long runtime (fix by @jfy133).
- #442 - Remove warning when BUSCO finds no genes in bins, as this can be expected in some datasets (reported by @Lumimar, fix by @jfy133).
- #444 - Moved BUSCO bash code to script (by @jfy133)
- #477 -
--gtdb
parameter is split into--skip_gtdbtk
and--gtdb_db
to allow finer control over GTDB database retrieval (fix by @jfy133) - #500 - Temporarily disabled downstream processing of both refined and raw bins due to bug (by @jfy133)
Fixed
- #496 - Fix help text for paramters
--bowtie2_mode
,spades_options
andmegahit_options
(by @willros) - #400 - Fix duplicated Zenodo badge in README (by @jfy133)
- #406 - Fix CheckM database always downloading, regardless if CheckM is selected (by @jfy133)
- #419 - Fix bug with busco_clean parameter, where it is always activated (by @prototaxites)
- #426 - Fixed typo in help text for parameters
--host_genome
and--host_fasta
(by @tillenglert) - #434 - Fix location of samplesheet for AWS full tests (reported by @Lfulcrum, fix by @jfy133)
- #438 - Fixed version inconsistency between conda and containers for GTDBTK_CLASSIFYWF (by @jfy133)
- #439 - Fix bug in assembly input (by @prototaxites)
- #447 - Remove
default: None
from parameter schema (by @drpatelh) - #449 - Fix results file overwriting in Ancient DNA workflow (reported by @alexhbnr, fix by @jfy133)
- #470 - Fix binning preparation from running even when binning was requested to be skipped (reported by @prototaxites, fix by @jfy133)
- #480 - Improved
-resume
reliability through better meta map preservation (reported by @prototaxites, fix by @jfy133) - #493 - Update
METABAT2
nf-core module so that it reduced the number of unnecessary file moves, enabling virtual filesystems (fix by @adamrtalbot) - #500 - Fix MaxBin2 bins not being saved in results directly properly (reported by @Perugolate, fix by @jfy133)
Dependencies
Tool | Previous version | New version |
---|---|---|
BCFtools | 1.16 | 1.17 |
SAMtools | 1.16.1 | 1.17 |
fastp | 0.23.2 | 0.23.4 |
MultiQC | 1.14 | 1.15 |
Fixed
Fixed
- #458 - Correct the major issue in ancient DNA workflow of binning refinement being performed on uncorrected contigs instead of aDNA consensus recalled contigs (issue #449)
- #451 - Fix results file overwriting in Ancient DNA workflow (reported by @alexhbnr, fix by @jfy133, and integrated by @maxibor in #458 )
Added
- #350 - Adds support for CheckM as alternative bin completeness and QC tool (added by @jfy133 and @skrakau)
- #353 - Added the busco_clean parameter to optionally clean each BUSCO directory after a successful (by @prototaxites)
- #361 - Added the skip_clipping parameter to skip read preprocessing with fastp or adapterremoval. Running the pipeline with skip_clipping, keep_phix and without specifying a host genome or fasta file skips the FASTQC_TRIMMED process (by @prototaxites)
- #365 - Added CONCOCT as an additional (optional) binning tool (by @jfy133)
- #366 - Added CAT_SUMMARISE process and cat_official_taxonomy parameter (by @prototaxites)
- #372 - Allow CAT_DB to take an extracted database as well as a tar.gz file (by @prototaxites).
- #380 - Added support for saving processed reads (clipped, host removed etc.) to results directory (by @jfy133)
- #394 - Added GUNC for additional chimeric bin/contamination QC (added by @jfy133)
Changed
- #340,#368,#373 - Update to nf-core 2.7.2
TEMPLATE
(by @jfy133, @d4straub, @skrakau) - #373 - Removed parameter
--enable_conda
. Updated local modules to new conda syntax and updated nf-core modules (by @skrakau) - #385 - CAT also now runs on unbinned contigs as well as binned contigs (added by @jfy133)
- #399 - Removed undocumented BUSCO_PLOT process (previously generated
*.busco_figure.png
plots unsuitable for metagenomics) (by @skrakau).
Fixed
- #345 - Bowtie2 mode changed to global alignment for ancient DNA mode (
--very-sensitive
mode) to prevent soft clipping at the end of reads when running in local mode. (by @maxibor) - #349 - Add a warning that pipeline will reset minimum contig size to 1500 specifically MetaBAT2 process, if a user supplies below this threshold. (by @jfy133)
- #352 - Escape the case in the BUSCO module that BUSCO can just detect a root lineage but is not able to find any marker genes (by @alexhbnr)
- #355 - Include error code 21 for retrying with higher memory for SPAdes and hybridSPAdes (by @mglubber)
Dependencies
Tool | Previous version | New version |
---|---|---|
BUSCO | 5.1.0 | 5.4.3 |
BCFtools | 1.14 | 1.16 |
Freebayes | 1.3.5 | 1.3.6 |
SAMtools | 1.15 | 1.16.1 |
- Fix too many symbolic links issue in local convert_depths module (reported by @ChristophKnapp and fixed by @apeltzer, @jfy133)
- Each sample now gets it’s own result directory for PyDamage analysis and filter (reported and fixed by @maxibor)
See full CHANGELOG for more information
- Restructure binning subworkflow in preparation for aDNA workflow and extended binning
- Add ancient DNA subworkflow
- Add MaxBin2 as second contig binning tool
- Add AdapterRemoval2 as an alternative read trimmer
- Add DAS Tool for bin refinement
- Activate pipeline-specific institutional nf-core/configs
- Add extra results folder
GenomeBinning/depths/contigs
for[assembler]-[sample/group]-depth.txt.gz
, andGenomeBinning/depths/bins
forbin_depths_summary.tsv
and[assembler]-[binner]-[sample/group]-binDepths.heatmap.png
- Updated some software: fastp 0.20.1 > 0.23.2, MultiQC 1.9 > 1.12
- Fix several bugs
See full CHANGELOG for more information
Contributors
- Add bin gene annotation with PROKKA
- Add prokaryotic gene finding with prodigal for each metagenome
- Add pipeline preprint information
- Updated some software: MultiQC 1.9 > 1.11, MEGAHIT 1.2.7 > 1.2.9, SPAdes 3.13.1 > 3.15.3
- Fix several bugs
See full CHANGELOG, for more information.
- Add bin abundance estimation based on median sequencing depths of corresponding contigs
- Add generation of heat maps with bin abundances across samples
- Output predicted genes for bins
- Fix handling of
BUSCO
output when run in auto lineage selection mode
See full CHANGELOG, for more information.
- Switch to Nextflow
DSL2
- Changed
--input
file format fromTSV
toCSV
format, requires header now - Add
BUSCO
automated lineage selection functionality - Add taxonomic bin classification with
GTDB-Tk
- Add process for
CAT
database creation as an alternative to using pre-built databases - Allow different folder structures for
Kraken2
databases - Requires nextflow version
>= 21.04.0
See full CHANGELOG, for more information.
- Manifest file has to be handed over via
--input
parameter now - Changed format of manifest input file: requires a ‘.tsv’ suffix and additionally contains group ID
- Add
--coassemble_group
parameter to allow group-wise co-assembly - Add
--binning_map_mode
parameter allowing different mapping strategies to compute co-abundances used for binning - TSV
--input
file allows now also entries containing only short reads
See full CHANGELOG, for more information.
- Fixed processing of
--input
parameter
See full CHANGELOG, for more information.
- Add full-size test
- Add worfklow overview figure to
README
- Fix
seaborn
tov0.10.1
to avoidnanoplot
error
See full CHANGELOG, for more information.
- Add host read removal with
Bowtie 2
- Add separate
MultiQC
section forFastQC
after preprocessing - Add
MetaBAT2
RNG seed parameter--metabat_rng_seed
and set the default to 1 which ensures reproducible binning results - Add parameters
--megahit_fix_cpu_1
,--spades_fix_cpus
and--spadeshybrid_fix_cpus
to ensure reproducible results from assembly tools - Fixed channel joining for multiple samples causing
MetaBAT2
error - Compress assembly files
- Fix BUSCO errors
See full CHANGELOG, for more information.
First release of the MAG pipeline 🎉
This initial version of the pipeline:
- assigns taxonomy to reads using https://ccb.jhu.edu/software/centrifuge/ and/or https://ccb.jhu.edu/software/kraken2/
- performs assembly using https://github.com/voutcn/megahit and http://cab.spbu.ru/software/spades/, and checks their quality using http://quast.sourceforge.net/quast
- performs metagenome binning using https://bitbucket.org/berkeleylab/metabat/src/master/, and checks the quality of the genome bins using https://busco.ezlab.org/