nf-core/mag

Assembly and binning of metagenomes

annotationassemblybinninglong-read-sequencingmetagenomesmetagenomicsnanoporenanopore-sequencing

Launch version 5.3.0 https://github.com/nf-core/mag

Version history

Download .zip Download .tar.gz View on GitHub

`Added`

#905 - Add nf-test snapshot for test_assembly_input profile (by @dialvarezs)
#930 - Add binner SemiBin2 (by @d4straub)
#861 - Added --generate_bigmag_file to execute the bigmag workflow that generates the file to be used as input for BIgMAG (added by @jeffe107)

`Changed`

#932 - Replaced usages of deprecated Channel() with channel() and fix other LSP warnings (by @dialvarezs)
#937 - Updated to nf-core 3.5.1 TEMPLATE (by @dialvarezs)
#938 - Update nf-core modules (by @dialvarezs)

`Fixed`

#894 - Fix read order in metaSPAdes to allow co-assembly of paired-end data of multiple samples (reported by @maartenciers, fix by @jfy133 with contributions from @prototaxites, @d4straub and @dialvarezs)
#927 - MetaBinner now succeeds when no contigs are too short or all are binned (reported by @MicroSeq, fix by @d4straub)
#929 - Allow the domain_classification.R script to run with any assembler, not just Megahit or Spades (reported by @MicroSeq, fix by @prototaxites)
#943 - Fixed concatenation of BUSCO summaries with uneven columns by changing from csvtk to qsv (reported by @jfy133 and @julianu, fix by @dialvarezs)
#943 - Fixed creation of the Tiara report channel used for concatenation (by @dialvarezs)
#945 - Skip mixing of GTDB-Tk MultiQC files when binning is skipped (reported by @amizeranschi, fix by @dialvarezs)
#953 - metaSPAdes retries upon error 250 (out of memory), rather than finishing the pipeline.
#954 - Skip GTDB-Tk when no bin QC tool is enabled and add warning messages (fix by @dialvarezs)
#956 - Support long reads assemblers in assembly input (fix by @dialvarezs)

`Dependencies`

Tool	Previous version	New version
bcftools	1.21	1.22
csvtk	0.31.0
fastp	0.24.0	1.0.1
geNomad	1.11.1	1.11.2
metamdbg	1.1	1.2
mmseqs	17.b804f	18.8cc5c
nf-core		3.5.1
qsv		5.1.0
samtools	1.21	1.22.1
SemiBin2		2.2.0

`Deprecated`

#943 - Remove csvtk/concat module (by @dialvarezs).

Download .zip Download .tar.gz View on GitHub

`Added`

#842 - Add support for running multiple binQC tools in one run using dedicated --run_busco, --run_checkm, and --run_checkm2 parameters (by @harper357, with contributions from @dialvarezs, @prototaxites and @jfy133)
#881 - Add binner MetaBinner (by @d4straub, insprired by @HeshamAlmessady & @AlphaSquad)

`Changed`

#842 - Change bin_summary.tsv format for improved clarity and more comprehensiveness (by @harper357, with contributions from @dialvarezs, @prototaxites and @jfy133)
- Now will include columns from all bin QC tools executed in a given run (i.e., all/any of BUSCO, CheckM and CheckM2)
- Adds suffixes to all columns (_<toolname>) to distinguish which column comes from which tool

`Fixed`

#896 - Remove obsolete execution command from README (by @dialvarezs)
#907 - Include refined bins from all binners in the DASTool/bins output folder (by @AlexHoratio)
#911 - Ensure column order is consistent when generating depth summaries to prevent swapped results on merged depth summary (by @dialvarezs)
#912 - Fix validation of multiple sequencing platforms when using binning_map_mode = "all" (reported by @mjfi2sb3, fix by @dialvarezs)
#921 - Fix publishing of BUSCO files (reported by @joao1980, fix by @dialvarezs)

`Dependencies`

Tool	Previous version	New version
MetaBinner		1.4.4-0

`Deprecated`

#842 - Remove --binqc_tool (by @harper357, with contributions from @dialvarezs, @prototaxites and @jfy133)

Download .zip Download .tar.gz View on GitHub

`Added`

#873 - Document usage of longread_percentidentity and shortread_percentidentity and set the value of longread_percentidentity in the test_full profile to 85 (by @prototaxites)
#875 - Add binner COMEBin (by @d4straub)

`Changed`

#878 - Refine test_full config with optimised resource usage for AWS release megatests (by @jfy133)
#880 - Updated to nf-core 3.4.1 TEMPLATE (by @jfy133)

`Fixed`

#878 - Fix METASPADES process not receiving the correct number of cpus from the fix CPUs parameter (by @jfy133)
#885 - Fix typo in long-read assembly mode selection (reported by @feixiang1209, fix by @jfy133)
#888 - Only error if all bins are size filtered if bins have actually been generated (reported by @hkaspersen, fix by @prototaxites)

`Dependencies`

Tool	Previous version	New version
nf-core	3.3.2	3.4.1
COMEBin		1.0.4

Download .zip Download .tar.gz View on GitHub

`Added`

#718 - Add support for independent long-read metagenomic assembly (requested by @ljmesi and many others, added by @muabnezor)
#718 - Added metaMDBG and (meta)Flye as long read assemblers (added by @muabnezor)
#718 - Added host removal for long reads using minimap2 as aligner (added by @muabnezor)
#827 - Added nf-test CI testing for all test profiles (added by @jfy133)
#829 - Add --skip_shortread_qc and --skip_longread_qc params for skipping certain default preprocessing steps (added by @erikrikarddaniel)
#846 - Improve documentation of group samplesheet column (added by @vinisalazar)
#855 - Add basic nf-tests for test_longreadonly, test_longreadonly_alternatives, test_hybrid and test_assembly_input (added by @dialvarezs)
#864 - Add --gtdbtk_skip_aniscreen to disable fast classification of genomes by ANI using skani in GTDB-Tk (by @jfy133 and @prototaxites).

`Changed`

#718 - Refactored all assembly steps into subworkflows (added by @muabnezor)
#799 - Add --cat_classify_unbinned, to enable taxonomic classification of unbinned contigs using CAT (requested by @amizeranschi, added by @dialvarezs)
#799 - Upgraded to latest version of CAT_pack modules (requested by @maxibor, added by @dialvarezs)
#811 - Update util modules, and remove aria2 module to replace with native Nextflow downloading of CheckM database (by @dialvarezs)
#816 - Removed all leftover references to conda ‘defaults’ channel (by @jfy133)
#823 - Updated to nf-core 3.3.1TEMPLATE (by @jfy133 )
#827 - Updated to nf-core 3.3.2TEMPLATE (by @dialvarezs)
#841 - MultiQC config updated to support CheckM, CheckM2, and GTDB-Tk (by @harper357)
#844 - Change loading mechanism of internal PhiX/Lambda databases to improve Dev UX when schema building (by @jfy133)
#851 - Improve structure of local modules and subworkflows (by @dialvarezs)
#853 - Update nf-core modules and subworkflows (by @dialvarezs)
#856 - Update more nf-core modules (by @dialvarezs)

`Fixed`

#843 - Fixed issue with large format Bowtie2 index files not being emitted from index module (reported by Nick Eckersley, fix by @jfy133)
#847 - Allow the BBNorm process to use only 0.8 of the memory allocated to the task to stop if from oversubscribing memory (reported by and fix by @erikrikarddaniel)
#850 - Fixed some modules of the GDTBTk subworkflow not being represented in version lists (fix by @jfy133)
#852 - Fixed version reporting by ensure all modules are represented in final version.yml for MultiQC (by @jfy133)
#854 - Update porechop/abi to a patched version to prevent duplicated read names (reported by @palec87, fix by @jfy133)
#858 - Fix a single parameter validation failure reporting errors for all parameters by updated nf-schema to 2.5.1 (reported by @Pranjal-Bioinfo, fix by @nvnieuwk and @jfy133)
#864 - Fix missing multi-threading of MetaEuk easypredict (reported by @OlivierCoen, fix by @prototaxites).

`Dependencies`

Tool	Previous version	New version
bcftools	1.17	1.21
BUSCO	5.8.3	6.0.0
CAT	5.2.3	6.0.1
centrifuge	1.0.4.1	1.0.4.2
dastool	1.1.6	1.1.7
nanolyse	1.41.6	1.44.1
fastp	0.23.4	0.24.0
flye		2.9.5
Freebayes	1.3.6	1.3.10
geNomad	1.5.2	1.11.0
GTDB-Tk	2.4.0	2.5.2
metabat2	2.15	2.17
metamdbg		1.0
minimap2		2.29
mmseqs2	14.7e284	17.b804f
samtools		1.21
nf-core	3.2.0	3.3.2
pydamage	0.7.0	1.0.0
seqtk	1.3	1.4
porechop_abi	0.5.0	0.5.0post1
NanoPlot	1.44.1	1.46.1

`Deprecated`

#799 - Removed --cat_official_taxonomy in favour of --cat_allow_unofficial_lineages to control CAT’s use of unofficial lineages (added by @dialvarezs)
#825 - Removed --centrifuge_db, --kraken2_db, --krona_db and --skip_krona parameters following the removal of taxonomic profiling functionality. See nf-core/taxprofiler for replacement (added by @dialvarezs)
#851 - Remove POOL_READ_* local modules in favor of nf-core cat/fastq (by @dialvarezs)
#855 - Remove test_adapterremoval, test_ancient_dna, test_bbnorm, test_busco_auto, test_host_rm, test_hybrid_host_rm, test_binrefinement, test_concoct and test_longread profiles (added by @dialvarezs)
#864 - Remove --gtdb_mash due to dropping of support by GTDBTk itself (by @prototaxites and @jfy133)

Download .zip Download .tar.gz View on GitHub

`Added`

#730 - Added --busco_db_lineage to allow specifying a specific lineage for BUSCO database (added by @jfy133, @dialvarezs)
#730 - Added a new documentation section on database setup (by @jfy133, @dialvarezs)
#784 - Added --bin_min_size and --bin_max_size parameters to filter out bins based on size (requested by @maxibor, @alexhbnr, added by @jfy133, @prototaxites)
#793 - Document use of a SquashFS image with --gtdb_db, useful for limited inode infrastructure (by @muniheart)
#805 - Add support for fastp’s --trim_poly_g option (by @jfy133)

`Changed`

#730 - Migrate from local BUSCO module to nf-core one, updating version (by @dialvarezs)
#730 - Use BUSCO database from nf-core test datasets (by @dialvarezs)
#788 - Tweak method of loading GTDB database in GTDBTK_CLASSIFYWF for more stability (reported by @alexhbnr, fix by @jfy133)
#800 - Default branch is now set to main (by @jfy133 and @mirpedrol)
#801 - Increase CheckM memory requests to match recommended requirements (by @jfy133)

`Fixed`

#789 - Improve --bowtie2_mode description to clarify default settings (reported by @IceGreb, fix by @jfy133)
#798 - Fix overly strict database validation for --metauk_db and improve documentation (reported by @ruqse, fix by @jfy133)
#804 - Fix broken memory specification for FASTQC (reported by @jmichaelegana, fix by @awgymer & @jfy133)

`Dependencies`

Tool	Previous version	New version
BUSCO	5.4.3	5.8.3
csvtk		0.31.0
nextflow	24.04.2	25.04.2

`Deprecated`

#730 - Remove --busco_auto_lineage_prok due to update and simplified usage of BUSCO (added by @jfy133, @dialvarezs)

Download .zip Download .tar.gz View on GitHub

`Added`

#745 - Added pipeline parameter spades_downstreaminput to use contigs instead of scaffolds (by @Pranjal-Bioinfo, @jfy133, @GallVp & @sateeshperi).
#745 - Added trimmomatic as an additional pre-processing tool (by @Pranjal-Bioinfo, @jfy133, @GallVp & @sateeshperi).
#745 - Added parameters for concoct/cut_up_fasta.py including bin_concoct_chunksize, bin_concoct_overlap and bin_concoct_donotconcatlast (by @Pranjal-Bioinfo, @jfy133, @GallVp & @sateeshperi).
#777 - Improved input validation through additional JSON keywords and error messages (by @agusinac)

`Changed`

#774 - Update CheckM2 to v1.1.0 and default database (by @dialvarezs).

`Fixed`

#726 - Fix formatting errors to follow Nextflow best practice (by @dialvarezs).
#769 - Fix megahit not emitting correct filenames due to suboptimal arguments ordering (reported and fix by @IceGreb)
#771 - Fix misspecified checkm2 database parameter check (reported by @dpelegri and fix by @jfy133)

`Dependencies`

Tool	Previous version	New version
CheckM2	1.0.2	1.1.0
SPAdes	4.0.0	4.1.0

Download .zip Download .tar.gz View on GitHub

`Added`

#758 - Added new diagram in metro-map style (by @jfy133, @prototaxites, @d4straub)

`Changed`

#731 - Updated to nf-core 3.1.2 TEMPLATE (by @jfy133)
#755 - Updated to nf-core 3.2.0 TEMPLATE (by @jfy133)

`Fixed`

#748 - Fix broken phix reference channel when skipping phix removal (reported by @amizeranschi, fix by @muabnezor)
#752 - Fix QUAST results not being displayed when skipping certain steps (reported by @amizeranschi, fix by @jfy133)
#753 - Fix iGenomes reference support for host removal reference genome (reported by @Thomieh73, fix by @jfy133)
#759 - Fixed parameters that allow both files or directories to not error with directories, and general file input validation improvements (reported by @mjfi2sb3, fix by @jfy133)

Download .zip Download .tar.gz View on GitHub

`Added`

#692 - Added Nanoq as optional longread filtering tool (added by @muabnezor)
#692 - Added chopper as optional longread filtering tool and/or phage lambda removal tool (added by @muabnezor)
#707 - Make Bin QC a subworkflow (added by @dialvarezs)
#707 - Added CheckM2 as an alternative bin completeness and QC tool (added by @dialvarezs)
#708 - Added --exclude_unbins_from_postbinning parameter to exclude unbinned contigs from post-binning processes, speeding up Prokka in some cases (added by @dialvarezs)
#732 - Added support for Prokka’s compliance mode with --prokka_with_compliance --prokka_compliance_centre <xyz> (reported by @audy and @Thomieh73, added by @jfy133)

`Changed`

#731 - Updated to nf-core 3.1.0 TEMPLATE (by @jfy133)

`Fixed`

#707 - Fixed channel passed as GUNC input (added by @dialvarezs)
#724 - Fix quoting in utils_nfcore_mag_pipeline/main.nf (added by @dialvarezs)
#716 - Make short read processing a subworkflow (added by @muabnezor)
#708 - Fixed channel passed as GUNC input (added by @dialvarezs)
#729 - Fixed misspecified multi-FASTQ input for single-end data in MEGAHIT (reported by John Richards, fix by @jfy133)

`Dependencies`

Tool	Previous version	New version
CheckM	1.2.1	1.2.3
CheckM2		1.0.2
chopper		0.9.0
GUNC	1.0.5	1.0.6
nanoq		0.10.0

Download .zip Download .tar.gz View on GitHub

`Fixed`

#707 - Fix missing space resulting in malformed args for MEGAHIT (reported by @d4straub, fix by @jfy133)

Download .zip Download .tar.gz View on GitHub

`Added`

#674 - Added --longread_adaptertrimming_tool Where user can chose between porechop_abi (default) and porechop (added by @muabnezor)

`Changed`

#674 - Changed to porechop-abi as default adapter trimming tool for long reads. User can still use porechop if preferred (added by @muabnezor)
#666 - Update SPAdes to version 4.0.0, replace both METASPADES and MEGAHIT with official nf-core modules (requested by @elsherbini, fix by @jfy133)
#666 - Update URLs to GTDB database downloads due to server move (reported by @Jokendo-collab, fix by @jfy133)
#695 - Updated to nf-core 3.0.2 TEMPLATE (by @jfy133)
#695 - Switch more stable Zenodo link for CheckM data (by @jfy133)

`Fixed`

#674 - Make longread preprocessing a subworkflow (added by @muabnezor)
#674 - Add porechop and filtlong logs to multiqc (added by @muabnezor)
#674 - Change local filtlong module to the official nf-core/filtlong module (added by @muabnezor)
#690 - MaxBin2 now using the abundance information from different samples rather than an average (reported by @uel3 and fixed by @d4straub)
#698 - Updated prodigal module to not pick up input symlinks for compression causing pigz errors (reported by @zackhenny, fix by @jfy133 )

`Dependencies`

Tool	Previous version	New version
Porechop_ABI		0.5.0
Filtlong	0.2.0	0.2.1
SPAdes	3.15.3	4.0.0

Download .zip Download .tar.gz View on GitHub

[!CAUTION] This release contains a potentially ‘breaking change’ for some users. The --gtdbtk_pplacer_scratch flag has been replaced with --gtdbtk_pplacer_useram. Check the parameter documentation for more details.

`Added`

#665 - Add support for supplying pre-made bowtie host reference index (requested by @simone-pignotti, added by @jfy133)
#670 - Added --gtdbtk_pplacer_useram to run GTDBTk in memory mode rather than write to disk (requested by @harper357, fixed by @jfy133)

`Changed`

#664 - Update GTDBTk to latest version, with updated column names, update GTDB to release 220 (by @dialvarezs)
#676 - Added exit code 12 to valid SPAdes retry codes, due to OOM errors from spades-hammer (reported by @bawee, fix by @jfy133)

`Fixed`

#667 - Fix pipeline crashing if only CONCOCT selected during binning (reported and fixed by @jfy133)
#670 - Re-add missing GTDBTk parameters into GTDBTk module (reported by harper357, fixed by @jfy133)
#672 - Fix GTDB-Tk per-sample TSV files not being published in output directory (reported by @jhayer, fix by @jfy133)

`Dependencies`

Tool	Previous version	New version
GTDBTk	2.3.2	2.4.0

`Deprecated`

#670 - Deprecated --gtdbtk_pplacer_scratch due to unintuitive usage (reported by harper357, fixed by @jfy133)

Download .zip Download .tar.gz View on GitHub

`Fixed`

#648 - Fix sample ID/assembly ID check failure when no IDs match (reported by @zackhenny, fix by @prototaxites)
#646 - GTDB-Tk directory input now creates a value channel so it runs for all entries to the process and not just the first (reported by @amizeranschi, fix by @prototaxites).
#639 - Fix pipeline failure when a sample produces only a single bin (fix by @d-callan)
#651 - Replace base container for bash only modules to reduce number of containers in pipeline (reported and fixed by @harper357)
#652 - Fix documentation typo in using user-defined assembly parameters (reported and fixed by @amizeranschi)
#653 - Fix overwriting of per-bin ‘raw’ GUNC RUN output files (multi-bin summary tables not affected) (reported by @zackhenny and fixed by @jfy133)

Download .zip Download .tar.gz View on GitHub

`Changed`

#633 - Changed BUSCO to use offline mode when the database is specified by the user (reported by @ChristophKnapp and many others, fix by @jfy133)
#632 - Use default NanoLyse log of just removed reads rather than custom (by @jfy133)

`Fixed`

#630 - Fix CONCOCT empty bins killing the pipeline, and allow for true multithreading again (removing OPENBLAS loop) (reported by @maxibor, fix by @maxibor and @jfy133)

`Dependencies`

Tool	Previous version	New version
Porechop	0.2.3_seqan2.1.1	0.2.4
NanoPlot	1.26.3	1.41.6
NanoLyse	1.1.0	1.2.0

Download .zip Download .tar.gz View on GitHub

`Changed`

#625 - Updated link to geNomad database for downloading (reported by @amizeranschi, fix by @jfy133)

`Fixed`

#618 - Fix CENTRIFUGE mkfifo failures by using work directory /tmp (reported by @skrakau, fix by @jfy133)

`Dependencies`

Tool	Previous version	New version
Centrifuge	1.0.4_beta	1.0.4.1

Download .zip Download .tar.gz View on GitHub

[!CAUTION] This release contains a potentially ‘breaking change’ for some users. The pipeline no longer directly accepts FASTQ files via --input. You must use a samplesheet and specify the FASTQs there.

`Added`

#615 - Add new logo (by @jfy133)

`Changed`

#599 - Update to nf-core v2.13.1 TEMPLATE (by @jfy133)
#614 - Update to nf-core v2.14.1 TEMPLATE (by @jfy133)

`Fixed`

#606 - Prevent pipeline crash when premade mashdb given to or no alignments found with GTDB-TK_CLASSIFYWF (reported by @cedwardson4, fix by @jfy133)

`Deprecated`

#599 - Direct reads input (--input 'sample_{R1,R2}.fastq.gz') is no longer supported, all input must come via samplesheets (by @jfy133)

Download .zip Download .tar.gz View on GitHub

`Changed`

#581 - Added explicit licence text to headers of all custom scripts (reported by @FriederikeHanssen and @maxibor, fix by @jfy133)

`Fixed`

#583 - Fix GTDB database input when directory supplied (fix by @jfy133)

Download .zip Download .tar.gz View on GitHub

`Changed`

#575 - Deactivated MetaSPAdes, Centrifuge, and GTDB in test_full profile due to some container incompatibilities in nf-core megatest AWS configurations (by @jfy133)

`Fixed`

#574 - Fix wrong channel going to BIN_SUMMARY (fix by @maxibor)

Download .zip Download .tar.gz View on GitHub

`Added`

#562 - Add CAT summary into the global bin_summary (by @maxibor)
#565 - Add warning of empty GTDB-TK results if no contigs pass completeness filter (by @jfy133 and @maxibor)

`Changed`

#563 Update to nf-core v2.12 TEMPLATE (by @CarsonJM)
#566 - More logical ordering of MultiQC sections (assembly and bin sections go together respectively) (fix by @jfy133)

`Fixed`

#548 - Fixes to (reported by @maxibor, @PPpissar, @muniheart, @llborcard, fix by @maxibor)
- GTDBK-TK execution
- CAT/QUAST/DEPTH bin summary file name collisions
- BUSCO database parsing
- Correct CAT name files
#558 - Fix bug in run merging when dealing with single end data (reported by @roberta-davidson, fix by @jfy133)

Download .zip Download .tar.gz View on GitHub

`Fixed`

#489 - Fix file name collision clashes for CHECKM, CAT, GTDBTK, and QUAST (reported by @tillenglert and @maxibor, fix by @maxibor)
#533 - Fix glob pattern for publishing MetaBAT2 bins in results (reported by @patriciatran, fix by @jfy133)
#535 - Fix input validation pattern to again allow direct FASTQ input (reported by @lennijusten, @emnilsson, fix by @jfy133, @d4straub, @mahesh-panchal, @nvnieuwk)

`Dependencies`

Tool	Previous version	New version
CAT	4.6	5.2.3

`Deprecated`

#536 - Remove custom function with native Nextflow for checking file extension (reported by @d4straub, fix by @jfy133)

Download .zip Download .tar.gz View on GitHub

`Added`

#504 - New parameters --busco_db, --kraken2_db, and --centrifuge_db now support directory input of a pre-uncompressed database archive directory (by @gregorysprenger).

`Changed`

#511 - Update to nf-core 2.10 TEMPLATE (by @jfy133)
#504 - --save_busco_reference is now replaced by --save_busco_db (by @gregorysprenger).

`Fixed`

#514 - Fix missing CONCOCT files in downstream output (reported by @maxibor, fix by @jfy133)
#515 - Fix overwriting of GUNC output directories when running with domain classification (reported by @maxibor, fix by @jfy133)
#516 - Fix edge-case bug where MEGAHIT re-uses previous work directory on resume and fails (reported by @husensofteng, fix by @prototaxites)
#520 - Fix missing Tiara output files (fix by @jfy133)
#522 - Fix ‘nulls’ in depth plot PNG files (fix by @jfy133)

`Deprecated`

#504 - --busco_reference, --busco_download_path, --save_busco_reference parameters have been deprecated and replaced with new parameters (by @gregorysprenger).

Download .zip Download .tar.gz View on GitHub

`Added`

#497 - Adds support for pointing at a local db for krona, using the parameter --krona_db (by @willros).
#395 - Adds support for fast domain-level classification of bins using Tiara, to allow bins to be separated into eukaryotic and prokaryotic-specific processes.
#422 - Adds support for normalization of read depth with BBNorm (added by @erikrikarddaniel and @fabianegli)
#439 - Adds ability to enter the pipeline at the binning stage by providing a CSV of pre-computed assemblies (by @prototaxites)
#459 - Adds ability to skip damage correction step in the ancient DNA workflow and just run pyDamage (by @jfy133)
#364 - Adds geNomad nf-core modules for identifying viruses in assemblies (by @PhilPalmer and @CarsonJM)
#481 - Adds MetaEuk for annotation of eukaryotic MAGs, and MMSeqs2 to enable downloading databases for MetaEuk (by @prototaxites)
#437 - --gtdb_db also now supports directory input of an pre-uncompressed GTDB archive directory (reported by @alneberg, fix by @jfy133)
#494 - Adds support for saving the BAM files from Bowtie2 mapping of input reads back to assembly (fix by @jfy133)

`Changed`

#428 #467 - Update to nf-core 2.8, 2.9 TEMPLATE (by @jfy133)
#429 - Replaced hardcoded CheckM database auto-download URL to a parameter (reported by @erikrikarddaniel, fix by @jfy133)
#441 - Deactivated CONCOCT in AWS ‘full test’ due to very long runtime (fix by @jfy133).
#442 - Remove warning when BUSCO finds no genes in bins, as this can be expected in some datasets (reported by @Lumimar, fix by @jfy133).
#444 - Moved BUSCO bash code to script (by @jfy133)
#477 - --gtdb parameter is split into --skip_gtdbtk and --gtdb_db to allow finer control over GTDB database retrieval (fix by @jfy133)
#500 - Temporarily disabled downstream processing of both refined and raw bins due to bug (by @jfy133)

`Fixed`

#496 - Fix help text for paramters --bowtie2_mode, spades_options and megahit_options (by @willros)
#400 - Fix duplicated Zenodo badge in README (by @jfy133)
#406 - Fix CheckM database always downloading, regardless if CheckM is selected (by @jfy133)
#419 - Fix bug with busco_clean parameter, where it is always activated (by @prototaxites)
#426 - Fixed typo in help text for parameters --host_genome and --host_fasta (by @tillenglert)
#434 - Fix location of samplesheet for AWS full tests (reported by @Lfulcrum, fix by @jfy133)
#438 - Fixed version inconsistency between conda and containers for GTDBTK_CLASSIFYWF (by @jfy133)
#439 - Fix bug in assembly input (by @prototaxites)
#447 - Remove default: None from parameter schema (by @drpatelh)
#449 - Fix results file overwriting in Ancient DNA workflow (reported by @alexhbnr, fix by @jfy133)
#470 - Fix binning preparation from running even when binning was requested to be skipped (reported by @prototaxites, fix by @jfy133)
#480 - Improved -resume reliability through better meta map preservation (reported by @prototaxites, fix by @jfy133)
#493 - Update METABAT2 nf-core module so that it reduced the number of unnecessary file moves, enabling virtual filesystems (fix by @adamrtalbot)
#500 - Fix MaxBin2 bins not being saved in results directly properly (reported by @Perugolate, fix by @jfy133)

`Dependencies`

Tool	Previous version	New version
BCFtools	1.16	1.17
SAMtools	1.16.1	1.17
fastp	0.23.2	0.23.4
MultiQC	1.14	1.15

Download .zip Download .tar.gz View on GitHub

`Fixed`

#461 - Fix full-size AWS test profile paths (by @jfy133)
#461 - Fix pyDamage results being overwritten (reported by @alexhbnr, fix by @jfy133)

Download .zip Download .tar.gz View on GitHub

`Fixed`

#458 - Correct the major issue in ancient DNA workflow of binning refinement being performed on uncorrected contigs instead of aDNA consensus recalled contigs (issue #449)
#451 - Fix results file overwriting in Ancient DNA workflow (reported by @alexhbnr, fix by @jfy133, and integrated by @maxibor in #458 )

Download .zip Download .tar.gz View on GitHub

`Added`

#350 - Adds support for CheckM as alternative bin completeness and QC tool (added by @jfy133 and @skrakau)
#353 - Added the busco_clean parameter to optionally clean each BUSCO directory after a successful (by @prototaxites)
#361 - Added the skip_clipping parameter to skip read preprocessing with fastp or adapterremoval. Running the pipeline with skip_clipping, keep_phix and without specifying a host genome or fasta file skips the FASTQC_TRIMMED process (by @prototaxites)
#365 - Added CONCOCT as an additional (optional) binning tool (by @jfy133)
#366 - Added CAT_SUMMARISE process and cat_official_taxonomy parameter (by @prototaxites)
#372 - Allow CAT_DB to take an extracted database as well as a tar.gz file (by @prototaxites).
#380 - Added support for saving processed reads (clipped, host removed etc.) to results directory (by @jfy133)
#394 - Added GUNC for additional chimeric bin/contamination QC (added by @jfy133)

`Changed`

#340,#368,#373 - Update to nf-core 2.7.2 TEMPLATE (by @jfy133, @d4straub, @skrakau)
#373 - Removed parameter --enable_conda. Updated local modules to new conda syntax and updated nf-core modules (by @skrakau)
#385 - CAT also now runs on unbinned contigs as well as binned contigs (added by @jfy133)
#399 - Removed undocumented BUSCO_PLOT process (previously generated *.busco_figure.png plots unsuitable for metagenomics) (by @skrakau).

`Fixed`

#345 - Bowtie2 mode changed to global alignment for ancient DNA mode (--very-sensitive mode) to prevent soft clipping at the end of reads when running in local mode. (by @maxibor)
#349 - Add a warning that pipeline will reset minimum contig size to 1500 specifically MetaBAT2 process, if a user supplies below this threshold. (by @jfy133)
#352 - Escape the case in the BUSCO module that BUSCO can just detect a root lineage but is not able to find any marker genes (by @alexhbnr)
#355 - Include error code 21 for retrying with higher memory for SPAdes and hybridSPAdes (by @mglubber)

`Dependencies`

Tool	Previous version	New version
BUSCO	5.1.0	5.4.3
BCFtools	1.14	1.16
Freebayes	1.3.5	1.3.6
SAMtools	1.15	1.16.1

Download .zip Download .tar.gz View on GitHub

Fix too many symbolic links issue in local convert_depths module (reported by @ChristophKnapp and fixed by @apeltzer, @jfy133)
Each sample now gets it’s own result directory for PyDamage analysis and filter (reported and fixed by @maxibor)

See full CHANGELOG for more information

Download .zip Download .tar.gz View on GitHub

Restructure binning subworkflow in preparation for aDNA workflow and extended binning
Add ancient DNA subworkflow
Add MaxBin2 as second contig binning tool
Add AdapterRemoval2 as an alternative read trimmer
Add DAS Tool for bin refinement
Activate pipeline-specific institutional nf-core/configs
Add extra results folder GenomeBinning/depths/contigs for [assembler]-[sample/group]-depth.txt.gz, and GenomeBinning/depths/bins for bin_depths_summary.tsv and [assembler]-[binner]-[sample/group]-binDepths.heatmap.png
Updated some software: fastp 0.20.1 > 0.23.2, MultiQC 1.9 > 1.12
Fix several bugs

See full CHANGELOG for more information

Contributors

@skrakau @d4straub @jfy133 @maxibor @alexhbnr @pcantalupo

Download .zip Download .tar.gz View on GitHub

Add bin gene annotation with PROKKA
Add prokaryotic gene finding with prodigal for each metagenome
Add pipeline preprint information
Updated some software: MultiQC 1.9 > 1.11, MEGAHIT 1.2.7 > 1.2.9, SPAdes 3.13.1 > 3.15.3
Fix several bugs

See full CHANGELOG, for more information.

Download .zip Download .tar.gz View on GitHub

Add bin abundance estimation based on median sequencing depths of corresponding contigs
Add generation of heat maps with bin abundances across samples
Output predicted genes for bins
Fix handling of BUSCO output when run in auto lineage selection mode

See full CHANGELOG, for more information.

Download .zip Download .tar.gz View on GitHub

Switch to Nextflow DSL2
Changed --input file format from TSV to CSV format, requires header now
Add BUSCO automated lineage selection functionality
Add taxonomic bin classification with GTDB-Tk
Add process for CAT database creation as an alternative to using pre-built databases
Allow different folder structures for Kraken2 databases
Requires nextflow version >= 21.04.0

See full CHANGELOG, for more information.

Download .zip Download .tar.gz View on GitHub

Manifest file has to be handed over via --input parameter now
Changed format of manifest input file: requires a ‘.tsv’ suffix and additionally contains group ID
Add --coassemble_group parameter to allow group-wise co-assembly
Add --binning_map_mode parameter allowing different mapping strategies to compute co-abundances used for binning
TSV --input file allows now also entries containing only short reads