nf-core/genephylomodeler
Edit

A bioinformatics pipeline that fits evolutionary models and detects natural selection from multiple sequence alignments

evolutionary-modelsmsanatural-selectionphylogeneticspositive-selection

This is the development version of the pipeline.

Launch development version https://github.com/nf-core/genephylomodeler

Introduction

This document describes the output produced by the pipeline. The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.

Pipeline overview

The pipeline is built using Nextflow and runs selection analysis using HyPhy and PAML with the following methods:

aBSREL - Adaptive Branch-Site Random Effects Likelihood
BGM - Bayesian Graphical Model
BUSTED - Branch-Site Unrestricted Statistical Test for Episodic Diversification
FADE - FUBAR Approach to Directional Evolution
FEL - Fixed Effects Likelihood
FUBAR - Fast Unconstrained Bayesian AppRoximation
GARD - Genetic Algorithm for Recombination Detection
MEME - Mixed Effects Model of Evolution
RELAX - Test for relaxation or intensification of selection
SLAC - Single-Likelihood Ancestor Counting
CODEML - PAML codon-substitution models for selection analysis
Pipeline information - Report metrics generated during the workflow execution

aBSREL

Output files

absrel/
- *_ABSREL.json: JSON results file containing per-branch test statistics for episodic diversifying selection.
- *_ABSREL_output.txt: Standard output log from the aBSREL analysis.

aBSREL (adaptive Branch-Site Random Effects Likelihood) tests each branch on a phylogeny for episodic diversifying selection. It infers the optimal number of omega rate classes for each branch and tests whether a proportion of sites have evolved under positive selection.

BGM

Output files

bgm/
- *_BGM.json: JSON results file containing posterior probabilities for coevolving site pairs.
- *_BGM_output.txt: Standard output log from the BGM analysis.

BGM (Bayesian Graphical Model) identifies pairs of codon sites that co-evolve using a Bayesian graphical model approach. It detects epistatic interactions between sites by looking for correlated substitution patterns across the phylogeny.

BUSTED

Output files

busted/
- *_BUSTED.json: JSON results file containing gene-wide test statistics for episodic diversifying selection.
- *_BUSTED_output.txt: Standard output log from the BUSTED analysis.

BUSTED (Branch-Site Unrestricted Statistical Test for Episodic Diversification) provides a gene-wide test for positive selection by asking whether a gene has experienced positive selection at at least one site on at least one branch.

FADE

Output files

fade/
- *_FADE.json: JSON results file containing per-site Bayes factors for directional selection.
- *_FADE_output.txt: Standard output log from the FADE analysis.

FADE (FUBAR Approach to Directional Evolution) detects directional selection at individual sites, identifying amino acid residues where evolution is biased toward specific substitutions in a designated set of branches. It extends the FUBAR framework to test whether a particular amino acid is being preferentially selected.

FEL

Output files

fel/
- *_FEL.json: JSON results file containing per-site test statistics for pervasive positive or negative selection.
- *_FEL_output.txt: Standard output log from the FEL analysis.

FEL (Fixed Effects Likelihood) estimates site-specific synonymous and non-synonymous substitution rates using a maximum likelihood approach. It tests each site individually for evidence of pervasive positive or purifying selection, assuming the selection pressure is constant across the entire phylogeny.

FUBAR

Output files

fubar/
- *_FUBAR.json: JSON results file containing posterior probabilities of positive or negative selection per site.
- *_FUBAR_output.txt: Standard output log from the FUBAR analysis.

FUBAR (Fast Unconstrained Bayesian AppRoximation) uses a Bayesian approach to infer site-specific selection rates. It provides posterior probabilities that each site is under positive or purifying selection and is substantially faster than FEL for large datasets.

GARD

Output files

gard/
- *_GARD.json: JSON results file containing detected recombination breakpoints and segment-specific phylogenies.
- *_GARD_output.txt: Standard output log from the GARD analysis.

GARD (Genetic Algorithm for Recombination Detection) screens alignments for evidence of recombination breakpoints. It uses a genetic algorithm to search for the placement of breakpoints that maximize the model fit, partitioning the alignment into segments with distinct phylogenetic topologies.

MEME

Output files

meme/
- *_MEME.json: JSON results file containing per-site test statistics for episodic diversifying selection.
- *_MEME_output.txt: Standard output log from the MEME analysis.

MEME (Mixed Effects Model of Evolution) tests for episodic positive selection at individual sites. Unlike methods that assume the same selection pressure across all lineages, MEME allows the distribution of omega to vary from site to site and from branch to branch.

RELAX

Output files

relax/
- *_RELAX.json: JSON results file containing test statistics for relaxation or intensification of selection.
- *_RELAX_output.txt: Standard output log from the RELAX analysis.

RELAX is a hypothesis testing framework that asks whether the strength of natural selection has been relaxed or intensified along a specified set of test branches. It uses a parameter K to quantify the extent of relaxation (K < 1) or intensification (K > 1).

SLAC

Output files

slac/
- *_SLAC.json: JSON results file containing per-site counts of synonymous and non-synonymous substitutions.
- *_SLAC_output.txt: Standard output log from the SLAC analysis.

SLAC (Single-Likelihood Ancestor Counting) is a counting-based method that estimates the number of synonymous and non-synonymous substitutions at each site by reconstructing ancestral sequences via maximum likelihood. It is the fastest site-level selection method and is well-suited for preliminary screening of large datasets.

CODEML

Output files

codeml/
- *_CODEML_output.txt: Main PAML CODEML results file containing maximum likelihood estimates, model parameters (ω, κ, branch lengths), and likelihood values for the requested codon substitution model.
- *.log: Standard output log captured from the CODEML run, including the PAML version banner and convergence diagnostics.

CODEML, part of the PAML package, fits codon-substitution models by maximum likelihood to estimate the non-synonymous-to-synonymous substitution rate ratio (ω = dN/dS). It supports a wide range of analyses depending on the user-supplied control file, including site models (M0, M1a, M2a, M7, M8) for site-level selection, branch models for lineage-specific ω, and branch-site models for episodic selection on designated foreground branches.

Pipeline information

Output files

pipeline_info/
- Reports generated by Nextflow: execution_report.html, execution_timeline.html, execution_trace.txt and pipeline_dag.dot/pipeline_dag.svg.
- Reports generated by the pipeline: pipeline_report.html, pipeline_report.txt and software_versions.yml. The pipeline_report* files will only be present if the --email / --email_on_fail parameter’s are used when running the pipeline.
- Reformatted samplesheet files used as input to the pipeline: samplesheet.valid.csv.
- Parameters used by the pipeline run: params.json.

Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.

On this page

nf-core/genephylomodeler Edit

Introduction

Pipeline overview

aBSREL

BGM

BUSTED

FADE

FEL

FUBAR

GARD

MEME

RELAX

SLAC

CODEML

Pipeline information

nf-core/genephylomodeler
Edit