nf-core/bactmap
A mapping-based pipeline for creating a phylogeny from bacterial whole genome sequences
0.9.1
). The latest
stable release is
1.0.0
.
Introduction
This document describes the output produced by the pipeline.
The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
Pipeline overview
The pipeline is built using Nextflow and processes data using the following steps:
Reference indexing
Read trimming
Read subsampling
Read mapping
Sort bam files
Call and filter variants
Convert filtered vcf to pseudogenome
Create alignment from pseudogenomes
Remove recombination (Optional)
Remove non-informative positions
- Construct phylogenetic tree (Optional)
- Pipeline information - Report metrics generated during the workflow execution
Reference Indexing
Output files
bwa/index.
*.amb
*.ann
*.bwt
*.pac
*.sa
These files are generally not required except for in the mapping step
Read Trimming
Output files
fastp/
*.html
html reports of the trimming process that can be opened in any modern web browser*.json
trimming report metrics in JSON computer readable formats
Read Subsampling
Output files
rasusa/
*.fastq.gz
subsamples fastq files
Read Mapping
By default there are no output files since sorted bam files are produced in the next step
Sort Bam Files
Output files
samtools/
*.bam
sorted bam files*.bam.bai
bam file index*.bam.flagstat
bam file metrics*.bam.idxstats
bam file metrics*.bam.stats
bam file metrics
Call and Filter Variants
Output files
variants/
*.vcf.gz
filtered vcf files containing variants
Convert Filtered VCF to Pseudogenome
Output files
pseudogenomes/
*.fas
pseudogenome with a base at each position of the reference sequence
Create Alignment from Pseudogenomes
Only those pseudogenome fasta files that have a non-ACGT fraction less than the threshold specified will be included in the aligned_pseudogenomes.fas
file. Those failing this will be reported in the low_quality_pseudogenomes.tsv
file.
Output files
pseudogenomes/
aligned_pseudogenomes.fas
alignment of all sample pseudogenomes and the reference sequencelow_quality_pseudogenomes.tsv
a tab separated file of the samples that failed the non-ACGT base threshold
Remove Recombination
The file used for downstream tree building is aligned_pseudogenomes.filtered_polymorphic_sites.fasta
. The other files are described in the gubbins documentation
Output files
gubbins/
aligned_pseudogenomes.branch_base_reconstruction.embl
aligned_pseudogenomes.filtered_polymorphic_sites.fasta
aligned_pseudogenomes.filtered_polymorphic_sites.phylip
aligned_pseudogenomes.final_tree.tre
aligned_pseudogenomes.node_labelled.final_tree.tre
aligned_pseudogenomes.per_branch_statistics.csv
aligned_pseudogenomes.recombination_predictions.embl
aligned_pseudogenomes.recombination_predictions.gff
aligned_pseudogenomes.summary_of_snp_distribution.vcf
Remove Non-informative Positions
Output files
snpsites/
constant.sites.txt
A file with the number of constant sites for each basefiltered_alignment.fas
Alignment with only informative positions (those positions that have at least one alternative variant base)
RapidNJ
Output files
rapidnj/
rapidnj_phylogeny.tre
A newick tree built with RapidNJ
FastTree
Output files
fasttree/
fasttree_phylogeny.tre
A newick tree built with FastTree
IQ-TREE
Output files
iqtree/
*.treefile
A ML tree built with IQ-TREE with support values for branches based on bootstrapping
RAxML-NG
Output files
iqtree/
output.raxml.bestTree
A ML tree built with RAxML-NG selected as the best after running MLoutput.raxml.support
A ML tree built with RAxML-NG with support values for branches based on bootstrapping
MultiQC
Various quality statistics are compiled from the previous outputs using the MultiQC software:
Overall Statistics
A compilation of statistics about read content, mapping and variants
FastP Statistics
Statistics gathered when trimming reads
Mapping Statistics
Statistics gathered when mapping reads
Varinat Statistics
Statistics gathered when calling variants after filtering
Pipeline information
Output files
pipeline_info/
- Reports generated by Nextflow:
execution_report.html
,execution_timeline.html
,execution_trace.txt
andpipeline_dag.dot
/pipeline_dag.svg
. - Reports generated by the pipeline:
pipeline_report.html
,pipeline_report.txt
andsoftware_versions.csv
. - Reformatted samplesheet files used as input to the pipeline:
samplesheet.valid.csv
.
- Reports generated by Nextflow:
Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.