Version history

Added

  • #692 - Added Nanoq as optional longread filtering tool (added by @muabnezor)
  • #692 - Added chopper as optional longread filtering tool and/or phage lambda removal tool (added by @muabnezor)
  • #707 - Make Bin QC a subworkflow (added by @dialvarezs)
  • #707 - Added CheckM2 as an alternative bin completeness and QC tool (added by @dialvarezs)
  • #708 - Added --exclude_unbins_from_postbinning parameter to exclude unbinned contigs from post-binning processes, speeding up Prokka in some cases (added by @dialvarezs)
  • #732 - Added support for Prokka’s compliance mode with --prokka_with_compliance --prokka_compliance_centre <xyz> (reported by @audy and @Thomieh73, added by @jfy133)

Changed

  • #731 - Updated to nf-core 3.1.0 TEMPLATE (by @jfy133)

Fixed

  • #707 - Fixed channel passed as GUNC input (added by @dialvarezs)
  • #724 - Fix quoting in utils_nfcore_mag_pipeline/main.nf (added by @dialvarezs)
  • #716 - Make short read processing a subworkflow (added by @muabnezor)
  • #708 - Fixed channel passed as GUNC input (added by @dialvarezs)
  • #729 - Fixed misspecified multi-FASTQ input for single-end data in MEGAHIT (reported by John Richards, fix by @jfy133)

Dependencies

ToolPrevious versionNew version
CheckM1.2.11.2.3
CheckM21.0.2
chopper0.9.0
GUNC1.0.51.0.6
nanoq0.10.0

Fixed

  • #707 - Fix missing space resulting in malformed args for MEGAHIT (reported by @d4straub, fix by @jfy133)

Added

  • #674 - Added --longread_adaptertrimming_tool Where user can chose between porechop_abi (default) and porechop (added by @muabnezor)

Changed

  • #674 - Changed to porechop-abi as default adapter trimming tool for long reads. User can still use porechop if preferred (added by @muabnezor)
  • #666 - Update SPAdes to version 4.0.0, replace both METASPADES and MEGAHIT with official nf-core modules (requested by @elsherbini, fix by @jfy133)
  • #666 - Update URLs to GTDB database downloads due to server move (reported by @Jokendo-collab, fix by @jfy133)
  • #695 - Updated to nf-core 3.0.2 TEMPLATE (by @jfy133)
  • #695 - Switch more stable Zenodo link for CheckM data (by @jfy133)

Fixed

  • #674 - Make longread preprocessing a subworkflow (added by @muabnezor)
  • #674 - Add porechop and filtlong logs to multiqc (added by @muabnezor)
  • #674 - Change local filtlong module to the official nf-core/filtlong module (added by @muabnezor)
  • #690 - MaxBin2 now using the abundance information from different samples rather than an average (reported by @uel3 and fixed by @d4straub)
  • #698 - Updated prodigal module to not pick up input symlinks for compression causing pigz errors (reported by @zackhenny, fix by @jfy133 )

Dependencies

ToolPrevious versionNew version
Porechop_ABI0.5.0
Filtlong0.2.00.2.1
SPAdes3.15.34.0.0

[!CAUTION] This release contains a potentially ‘breaking change’ for some users. The --gtdbtk_pplacer_scratch flag has been replaced with --gtdbtk_pplacer_useram. Check the parameter documentation for more details.

Added

  • #665 - Add support for supplying pre-made bowtie host reference index (requested by @simone-pignotti, added by @jfy133)
  • #670 - Added --gtdbtk_pplacer_useram to run GTDBTk in memory mode rather than write to disk (requested by @harper357, fixed by @jfy133)

Changed

  • #664 - Update GTDBTk to latest version, with updated column names, update GTDB to release 220 (by @dialvarezs)
  • #676 - Added exit code 12 to valid SPAdes retry codes, due to OOM errors from spades-hammer (reported by @bawee, fix by @jfy133)

Fixed

  • #667 - Fix pipeline crashing if only CONCOCT selected during binning (reported and fixed by @jfy133)
  • #670 - Re-add missing GTDBTk parameters into GTDBTk module (reported by harper357, fixed by @jfy133)
  • #672 - Fix GTDB-Tk per-sample TSV files not being published in output directory (reported by @jhayer, fix by @jfy133)

Dependencies

ToolPrevious versionNew version
GTDBTk2.3.22.4.0

Deprecated

  • #670 - Deprecated --gtdbtk_pplacer_scratch due to unintuitive usage (reported by harper357, fixed by @jfy133)

Fixed

  • #648 - Fix sample ID/assembly ID check failure when no IDs match (reported by @zackhenny, fix by @prototaxites)
  • #646 - GTDB-Tk directory input now creates a value channel so it runs for all entries to the process and not just the first (reported by @amizeranschi, fix by @prototaxites).
  • #639 - Fix pipeline failure when a sample produces only a single bin (fix by @d-callan)
  • #651 - Replace base container for bash only modules to reduce number of containers in pipeline (reported and fixed by @harper357)
  • #652 - Fix documentation typo in using user-defined assembly parameters (reported and fixed by @amizeranschi)
  • #653 - Fix overwriting of per-bin ‘raw’ GUNC RUN output files (multi-bin summary tables not affected) (reported by @zackhenny and fixed by @jfy133)

Changed

  • #633 - Changed BUSCO to use offline mode when the database is specified by the user (reported by @ChristophKnapp and many others, fix by @jfy133)
  • #632 - Use default NanoLyse log of just removed reads rather than custom (by @jfy133)

Fixed

  • #630 - Fix CONCOCT empty bins killing the pipeline, and allow for true multithreading again (removing OPENBLAS loop) (reported by @maxibor, fix by @maxibor and @jfy133)

Dependencies

ToolPrevious versionNew version
Porechop0.2.3_seqan2.1.10.2.4
NanoPlot1.26.31.41.6
NanoLyse1.1.01.2.0

Changed

Fixed

  • #618 - Fix CENTRIFUGE mkfifo failures by using work directory /tmp (reported by @skrakau, fix by @jfy133)

Dependencies

ToolPrevious versionNew version
Centrifuge1.0.4_beta1.0.4.1

[!CAUTION] This release contains a potentially ‘breaking change’ for some users. The pipeline no longer directly accepts FASTQ files via --input. You must use a samplesheet and specify the FASTQs there.

Added

Changed

  • #599 - Update to nf-core v2.13.1 TEMPLATE (by @jfy133)
  • #614 - Update to nf-core v2.14.1 TEMPLATE (by @jfy133)

Fixed

  • #606 - Prevent pipeline crash when premade mashdb given to or no alignments found with GTDB-TK_CLASSIFYWF (reported by @cedwardson4, fix by @jfy133)

Deprecated

  • #599 - Direct reads input (--input 'sample_{R1,R2}.fastq.gz') is no longer supported, all input must come via samplesheets (by @jfy133)

Changed

Fixed

  • #583 - Fix GTDB database input when directory supplied (fix by @jfy133)

Changed

  • #575 - Deactivated MetaSPAdes, Centrifuge, and GTDB in test_full profile due to some container incompatibilities in nf-core megatest AWS configurations (by @jfy133)

Fixed

  • #574 - Fix wrong channel going to BIN_SUMMARY (fix by @maxibor)

Added

  • #562 - Add CAT summary into the global bin_summary (by @maxibor)
  • #565 - Add warning of empty GTDB-TK results if no contigs pass completeness filter (by @jfy133 and @maxibor)

Changed

  • #563 Update to nf-core v2.12 TEMPLATE (by @CarsonJM)
  • #566 - More logical ordering of MultiQC sections (assembly and bin sections go together respectively) (fix by @jfy133)

Fixed

Fixed

Dependencies

ToolPrevious versionNew version
CAT4.65.2.3

Deprecated

  • #536 - Remove custom function with native Nextflow for checking file extension (reported by @d4straub, fix by @jfy133)

Added

  • #504 - New parameters --busco_db, --kraken2_db, and --centrifuge_db now support directory input of a pre-uncompressed database archive directory (by @gregorysprenger).

Changed

Fixed

  • #514 - Fix missing CONCOCT files in downstream output (reported by @maxibor, fix by @jfy133)
  • #515 - Fix overwriting of GUNC output directories when running with domain classification (reported by @maxibor, fix by @jfy133)
  • #516 - Fix edge-case bug where MEGAHIT re-uses previous work directory on resume and fails (reported by @husensofteng, fix by @prototaxites)
  • #520 - Fix missing Tiara output files (fix by @jfy133)
  • #522 - Fix ‘nulls’ in depth plot PNG files (fix by @jfy133)

Deprecated

  • #504 - --busco_reference, --busco_download_path, --save_busco_reference parameters have been deprecated and replaced with new parameters (by @gregorysprenger).

Added

  • #497 - Adds support for pointing at a local db for krona, using the parameter --krona_db (by @willros).
  • #395 - Adds support for fast domain-level classification of bins using Tiara, to allow bins to be separated into eukaryotic and prokaryotic-specific processes.
  • #422 - Adds support for normalization of read depth with BBNorm (added by @erikrikarddaniel and @fabianegli)
  • #439 - Adds ability to enter the pipeline at the binning stage by providing a CSV of pre-computed assemblies (by @prototaxites)
  • #459 - Adds ability to skip damage correction step in the ancient DNA workflow and just run pyDamage (by @jfy133)
  • #364 - Adds geNomad nf-core modules for identifying viruses in assemblies (by @PhilPalmer and @CarsonJM)
  • #481 - Adds MetaEuk for annotation of eukaryotic MAGs, and MMSeqs2 to enable downloading databases for MetaEuk (by @prototaxites)
  • #437 - --gtdb_db also now supports directory input of an pre-uncompressed GTDB archive directory (reported by @alneberg, fix by @jfy133)
  • #494 - Adds support for saving the BAM files from Bowtie2 mapping of input reads back to assembly (fix by @jfy133)

Changed

  • #428 #467 - Update to nf-core 2.8, 2.9 TEMPLATE (by @jfy133)
  • #429 - Replaced hardcoded CheckM database auto-download URL to a parameter (reported by @erikrikarddaniel, fix by @jfy133)
  • #441 - Deactivated CONCOCT in AWS ‘full test’ due to very long runtime (fix by @jfy133).
  • #442 - Remove warning when BUSCO finds no genes in bins, as this can be expected in some datasets (reported by @Lumimar, fix by @jfy133).
  • #444 - Moved BUSCO bash code to script (by @jfy133)
  • #477 - --gtdb parameter is split into --skip_gtdbtk and --gtdb_db to allow finer control over GTDB database retrieval (fix by @jfy133)
  • #500 - Temporarily disabled downstream processing of both refined and raw bins due to bug (by @jfy133)

Fixed

  • #496 - Fix help text for paramters --bowtie2_mode, spades_options and megahit_options (by @willros)
  • #400 - Fix duplicated Zenodo badge in README (by @jfy133)
  • #406 - Fix CheckM database always downloading, regardless if CheckM is selected (by @jfy133)
  • #419 - Fix bug with busco_clean parameter, where it is always activated (by @prototaxites)
  • #426 - Fixed typo in help text for parameters --host_genome and --host_fasta (by @tillenglert)
  • #434 - Fix location of samplesheet for AWS full tests (reported by @Lfulcrum, fix by @jfy133)
  • #438 - Fixed version inconsistency between conda and containers for GTDBTK_CLASSIFYWF (by @jfy133)
  • #439 - Fix bug in assembly input (by @prototaxites)
  • #447 - Remove default: None from parameter schema (by @drpatelh)
  • #449 - Fix results file overwriting in Ancient DNA workflow (reported by @alexhbnr, fix by @jfy133)
  • #470 - Fix binning preparation from running even when binning was requested to be skipped (reported by @prototaxites, fix by @jfy133)
  • #480 - Improved -resume reliability through better meta map preservation (reported by @prototaxites, fix by @jfy133)
  • #493 - Update METABAT2 nf-core module so that it reduced the number of unnecessary file moves, enabling virtual filesystems (fix by @adamrtalbot)
  • #500 - Fix MaxBin2 bins not being saved in results directly properly (reported by @Perugolate, fix by @jfy133)

Dependencies

ToolPrevious versionNew version
BCFtools1.161.17
SAMtools1.16.11.17
fastp0.23.20.23.4
MultiQC1.141.15

Fixed

Fixed

  • #458 - Correct the major issue in ancient DNA workflow of binning refinement being performed on uncorrected contigs instead of aDNA consensus recalled contigs (issue #449)
  • #451 - Fix results file overwriting in Ancient DNA workflow (reported by @alexhbnr, fix by @jfy133, and integrated by @maxibor in #458 )

Added

  • #350 - Adds support for CheckM as alternative bin completeness and QC tool (added by @jfy133 and @skrakau)
  • #353 - Added the busco_clean parameter to optionally clean each BUSCO directory after a successful (by @prototaxites)
  • #361 - Added the skip_clipping parameter to skip read preprocessing with fastp or adapterremoval. Running the pipeline with skip_clipping, keep_phix and without specifying a host genome or fasta file skips the FASTQC_TRIMMED process (by @prototaxites)
  • #365 - Added CONCOCT as an additional (optional) binning tool (by @jfy133)
  • #366 - Added CAT_SUMMARISE process and cat_official_taxonomy parameter (by @prototaxites)
  • #372 - Allow CAT_DB to take an extracted database as well as a tar.gz file (by @prototaxites).
  • #380 - Added support for saving processed reads (clipped, host removed etc.) to results directory (by @jfy133)
  • #394 - Added GUNC for additional chimeric bin/contamination QC (added by @jfy133)

Changed

  • #340,#368,#373 - Update to nf-core 2.7.2 TEMPLATE (by @jfy133, @d4straub, @skrakau)
  • #373 - Removed parameter --enable_conda. Updated local modules to new conda syntax and updated nf-core modules (by @skrakau)
  • #385 - CAT also now runs on unbinned contigs as well as binned contigs (added by @jfy133)
  • #399 - Removed undocumented BUSCO_PLOT process (previously generated *.busco_figure.png plots unsuitable for metagenomics) (by @skrakau).

Fixed

  • #345 - Bowtie2 mode changed to global alignment for ancient DNA mode (--very-sensitive mode) to prevent soft clipping at the end of reads when running in local mode. (by @maxibor)
  • #349 - Add a warning that pipeline will reset minimum contig size to 1500 specifically MetaBAT2 process, if a user supplies below this threshold. (by @jfy133)
  • #352 - Escape the case in the BUSCO module that BUSCO can just detect a root lineage but is not able to find any marker genes (by @alexhbnr)
  • #355 - Include error code 21 for retrying with higher memory for SPAdes and hybridSPAdes (by @mglubber)

Dependencies

ToolPrevious versionNew version
BUSCO5.1.05.4.3
BCFtools1.141.16
Freebayes1.3.51.3.6
SAMtools1.151.16.1

  • Fix too many symbolic links issue in local convert_depths module (reported by @ChristophKnapp and fixed by @apeltzer, @jfy133)
  • Each sample now gets it’s own result directory for PyDamage analysis and filter (reported and fixed by @maxibor)

See full CHANGELOG for more information

  • Restructure binning subworkflow in preparation for aDNA workflow and extended binning
  • Add ancient DNA subworkflow
  • Add MaxBin2 as second contig binning tool
  • Add AdapterRemoval2 as an alternative read trimmer
  • Add DAS Tool for bin refinement
  • Activate pipeline-specific institutional nf-core/configs
  • Add extra results folder GenomeBinning/depths/contigs for [assembler]-[sample/group]-depth.txt.gz, and GenomeBinning/depths/bins for bin_depths_summary.tsv and [assembler]-[binner]-[sample/group]-binDepths.heatmap.png
  • Updated some software: fastp 0.20.1 > 0.23.2, MultiQC 1.9 > 1.12
  • Fix several bugs

See full CHANGELOG for more information

Contributors

@skrakau @d4straub @jfy133 @maxibor @alexhbnr @pcantalupo

  • Add bin gene annotation with PROKKA
  • Add prokaryotic gene finding with prodigal for each metagenome
  • Add pipeline preprint information
  • Updated some software: MultiQC 1.9 > 1.11, MEGAHIT 1.2.7 > 1.2.9, SPAdes 3.13.1 > 3.15.3
  • Fix several bugs

See full CHANGELOG, for more information.

  • Add bin abundance estimation based on median sequencing depths of corresponding contigs
  • Add generation of heat maps with bin abundances across samples
  • Output predicted genes for bins
  • Fix handling of BUSCO output when run in auto lineage selection mode

See full CHANGELOG, for more information.

  • Switch to Nextflow DSL2
  • Changed --input file format from TSV to CSV format, requires header now
  • Add BUSCO automated lineage selection functionality
  • Add taxonomic bin classification with GTDB-Tk
  • Add process for CAT database creation as an alternative to using pre-built databases
  • Allow different folder structures for Kraken2 databases
  • Requires nextflow version >= 21.04.0

See full CHANGELOG, for more information.

  • Manifest file has to be handed over via --input parameter now
  • Changed format of manifest input file: requires a ‘.tsv’ suffix and additionally contains group ID
  • Add --coassemble_group parameter to allow group-wise co-assembly
  • Add --binning_map_mode parameter allowing different mapping strategies to compute co-abundances used for binning
  • TSV --input file allows now also entries containing only short reads

See full CHANGELOG, for more information.

  • Fixed processing of --input parameter

See full CHANGELOG, for more information.

  • Add full-size test
  • Add worfklow overview figure to README
  • Fix seaborn to v0.10.1 to avoid nanoplot error

See full CHANGELOG, for more information.

  • Add host read removal with Bowtie 2
  • Add separate MultiQC section for FastQC after preprocessing
  • Add MetaBAT2 RNG seed parameter --metabat_rng_seed and set the default to 1 which ensures reproducible binning results
  • Add parameters --megahit_fix_cpu_1, --spades_fix_cpus and --spadeshybrid_fix_cpus to ensure reproducible results from assembly tools
  • Fixed channel joining for multiple samples causing MetaBAT2 error
  • Compress assembly files
  • Fix BUSCO errors

See full CHANGELOG, for more information.

First release of the MAG pipeline 🎉

This initial version of the pipeline: