Version history
Added
- #497 - Adds support for pointing at a local db for krona, using the parameter
--krona_db
(by @willros). - #395 - Adds support for fast domain-level classification of bins using Tiara, to allow bins to be separated into eukaryotic and prokaryotic-specific processes.
- #422 - Adds support for normalization of read depth with BBNorm (added by @erikrikarddaniel and @fabianegli)
- #439 - Adds ability to enter the pipeline at the binning stage by providing a CSV of pre-computed assemblies (by @prototaxites)
- #459 - Adds ability to skip damage correction step in the ancient DNA workflow and just run pyDamage (by @jfy133)
- #364 - Adds geNomad nf-core modules for identifying viruses in assemblies (by @PhilPalmer and @CarsonJM)
- #481 - Adds MetaEuk for annotation of eukaryotic MAGs, and MMSeqs2 to enable downloading databases for MetaEuk (by @prototaxites)
- #437 -
--gtdb_db
also now supports directory input of an pre-uncompressed GTDB archive directory (reported by @alneberg, fix by @jfy133) - #494 - Adds support for saving the BAM files from Bowtie2 mapping of input reads back to assembly (fix by @jfy133)
Changed
- #428 #467 - Update to nf-core 2.8, 2.9
TEMPLATE
(by @jfy133) - #429 - Replaced hardcoded CheckM database auto-download URL to a parameter (reported by @erikrikarddaniel, fix by @jfy133)
- #441 - Deactivated CONCOCT in AWS ‘full test’ due to very long runtime (fix by @jfy133).
- #442 - Remove warning when BUSCO finds no genes in bins, as this can be expected in some datasets (reported by @Lumimar, fix by @jfy133).
- #444 - Moved BUSCO bash code to script (by @jfy133)
- #477 -
--gtdb
parameter is split into--skip_gtdbtk
and--gtdb_db
to allow finer control over GTDB database retrieval (fix by @jfy133) - #500 - Temporarily disabled downstream processing of both refined and raw bins due to bug (by @jfy133)
Fixed
- #496 - Fix help text for paramters
--bowtie2_mode
,spades_options
andmegahit_options
(by @willros) - #400 - Fix duplicated Zenodo badge in README (by @jfy133)
- #406 - Fix CheckM database always downloading, regardless if CheckM is selected (by @jfy133)
- #419 - Fix bug with busco_clean parameter, where it is always activated (by @prototaxites)
- #426 - Fixed typo in help text for parameters
--host_genome
and--host_fasta
(by @tillenglert) - #434 - Fix location of samplesheet for AWS full tests (reported by @Lfulcrum, fix by @jfy133)
- #438 - Fixed version inconsistency between conda and containers for GTDBTK_CLASSIFYWF (by @jfy133)
- #439 - Fix bug in assembly input (by @prototaxites)
- #447 - Remove
default: None
from parameter schema (by @drpatelh) - #449 - Fix results file overwriting in Ancient DNA workflow (reported by @alexhbnr, fix by @jfy133)
- #470 - Fix binning preparation from running even when binning was requested to be skipped (reported by @prototaxites, fix by @jfy133)
- #480 - Improved
-resume
reliability through better meta map preservation (reported by @prototaxites, fix by @jfy133) - #493 - Update
METABAT2
nf-core module so that it reduced the number of unnecessary file moves, enabling virtual filesystems (fix by @adamrtalbot) - #500 - Fix MaxBin2 bins not being saved in results directly properly (reported by @Perugolate, fix by @jfy133)
Dependencies
Tool | Previous version | New version |
---|---|---|
BCFtools | 1.16 | 1.17 |
SAMtools | 1.16.1 | 1.17 |
fastp | 0.23.2 | 0.23.4 |
MultiQC | 1.14 | 1.15 |
Fixed
Fixed
- #458 - Correct the major issue in ancient DNA workflow of binning refinement being performed on uncorrected contigs instead of aDNA consensus recalled contigs (issue #449)
- #451 - Fix results file overwriting in Ancient DNA workflow (reported by @alexhbnr, fix by @jfy133, and integrated by @maxibor in #458 )
Added
- #350 - Adds support for CheckM as alternative bin completeness and QC tool (added by @jfy133 and @skrakau)
- #353 - Added the busco_clean parameter to optionally clean each BUSCO directory after a successful (by @prototaxites)
- #361 - Added the skip_clipping parameter to skip read preprocessing with fastp or adapterremoval. Running the pipeline with skip_clipping, keep_phix and without specifying a host genome or fasta file skips the FASTQC_TRIMMED process (by @prototaxites)
- #365 - Added CONCOCT as an additional (optional) binning tool (by @jfy133)
- #366 - Added CAT_SUMMARISE process and cat_official_taxonomy parameter (by @prototaxites)
- #372 - Allow CAT_DB to take an extracted database as well as a tar.gz file (by @prototaxites).
- #380 - Added support for saving processed reads (clipped, host removed etc.) to results directory (by @jfy133)
- #394 - Added GUNC for additional chimeric bin/contamination QC (added by @jfy133)
Changed
- #340,#368,#373 - Update to nf-core 2.7.2
TEMPLATE
(by @jfy133, @d4straub, @skrakau) - #373 - Removed parameter
--enable_conda
. Updated local modules to new conda syntax and updated nf-core modules (by @skrakau) - #385 - CAT also now runs on unbinned contigs as well as binned contigs (added by @jfy133)
- #399 - Removed undocumented BUSCO_PLOT process (previously generated
*.busco_figure.png
plots unsuitable for metagenomics) (by @skrakau).
Fixed
- #345 - Bowtie2 mode changed to global alignment for ancient DNA mode (
--very-sensitive
mode) to prevent soft clipping at the end of reads when running in local mode. (by @maxibor) - #349 - Add a warning that pipeline will reset minimum contig size to 1500 specifically MetaBAT2 process, if a user supplies below this threshold. (by @jfy133)
- #352 - Escape the case in the BUSCO module that BUSCO can just detect a root lineage but is not able to find any marker genes (by @alexhbnr)
- #355 - Include error code 21 for retrying with higher memory for SPAdes and hybridSPAdes (by @mglubber)
Dependencies
Tool | Previous version | New version |
---|---|---|
BUSCO | 5.1.0 | 5.4.3 |
BCFtools | 1.14 | 1.16 |
Freebayes | 1.3.5 | 1.3.6 |
SAMtools | 1.15 | 1.16.1 |
- Fix too many symbolic links issue in local convert_depths module (reported by @ChristophKnapp and fixed by @apeltzer, @jfy133)
- Each sample now gets it’s own result directory for PyDamage analysis and filter (reported and fixed by @maxibor)
See full CHANGELOG for more information
- Restructure binning subworkflow in preparation for aDNA workflow and extended binning
- Add ancient DNA subworkflow
- Add MaxBin2 as second contig binning tool
- Add AdapterRemoval2 as an alternative read trimmer
- Add DAS Tool for bin refinement
- Activate pipeline-specific institutional nf-core/configs
- Add extra results folder
GenomeBinning/depths/contigs
for[assembler]-[sample/group]-depth.txt.gz
, andGenomeBinning/depths/bins
forbin_depths_summary.tsv
and[assembler]-[binner]-[sample/group]-binDepths.heatmap.png
- Updated some software: fastp 0.20.1 > 0.23.2, MultiQC 1.9 > 1.12
- Fix several bugs
See full CHANGELOG for more information
Contributors
@skrakau @d4straub @jfy133 @maxibor @alexhbnr @pcantalupo
- Add bin gene annotation with PROKKA
- Add prokaryotic gene finding with prodigal for each metagenome
- Add pipeline preprint information
- Updated some software: MultiQC 1.9 > 1.11, MEGAHIT 1.2.7 > 1.2.9, SPAdes 3.13.1 > 3.15.3
- Fix several bugs
See full CHANGELOG, for more information.
- Add bin abundance estimation based on median sequencing depths of corresponding contigs
- Add generation of heat maps with bin abundances across samples
- Output predicted genes for bins
- Fix handling of
BUSCO
output when run in auto lineage selection mode
See full CHANGELOG, for more information.
- Switch to Nextflow
DSL2
- Changed
--input
file format fromTSV
toCSV
format, requires header now - Add
BUSCO
automated lineage selection functionality - Add taxonomic bin classification with
GTDB-Tk
- Add process for
CAT
database creation as an alternative to using pre-built databases - Allow different folder structures for
Kraken2
databases - Requires nextflow version
>= 21.04.0
See full CHANGELOG, for more information.
- Manifest file has to be handed over via
--input
parameter now - Changed format of manifest input file: requires a ‘.tsv’ suffix and additionally contains group ID
- Add
--coassemble_group
parameter to allow group-wise co-assembly - Add
--binning_map_mode
parameter allowing different mapping strategies to compute co-abundances used for binning - TSV
--input
file allows now also entries containing only short reads
See full CHANGELOG, for more information.
- Fixed processing of
--input
parameter
See full CHANGELOG, for more information.
- Add full-size test
- Add worfklow overview figure to
README
- Fix
seaborn
tov0.10.1
to avoidnanoplot
error
See full CHANGELOG, for more information.
- Add host read removal with
Bowtie 2
- Add separate
MultiQC
section forFastQC
after preprocessing - Add
MetaBAT2
RNG seed parameter--metabat_rng_seed
and set the default to 1 which ensures reproducible binning results - Add parameters
--megahit_fix_cpu_1
,--spades_fix_cpus
and--spadeshybrid_fix_cpus
to ensure reproducible results from assembly tools - Fixed channel joining for multiple samples causing
MetaBAT2
error - Compress assembly files
- Fix BUSCO errors
See full CHANGELOG, for more information.
First release of the MAG pipeline 🎉
This initial version of the pipeline:
- assigns taxonomy to reads using https://ccb.jhu.edu/software/centrifuge/ and/or https://ccb.jhu.edu/software/kraken2/
- performs assembly using https://github.com/voutcn/megahit and http://cab.spbu.ru/software/spades/, and checks their quality using http://quast.sourceforge.net/quast
- performs metagenome binning using https://bitbucket.org/berkeleylab/metabat/src/master/, and checks the quality of the genome bins using https://busco.ezlab.org/