differentialabundance: Parameters

Define where the pipeline should find input data and save output data.

A string identifier used to name result files in the output directory

required

type: string

default: study

Input data format category used for input validation and routing (not for selecting analysis methods).

required

type: string

Path to CSV/TSV file containing information about the samples in the experiment.

required

type: string

pattern: ^\S+\.(csv|tsv)$

A CSV/TSV/YML/YAML file describing sample contrasts to compare groups.

type: string

pattern: ^\S+\.(csv|tsv|yml|yaml)$

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

required

type: string

Label describing the abundance values in the report heading.

hidden

type: string

default: abundance

To how many digits should numeric output in different modules be rounded? If -1 or null, will not round.

type: integer

default: -1

Global seed for stochastic methods

type: integer

Ways of providing your abundance values

TSV/CSV-format abundance matrix

type: string

pattern: ^\S+\.(tsv|csv)$|\S*proteinGroups\.txt$

(RNA-seq only): optional transcript/gene length matrix with samples and transcript_ids/gene_ids as in the abundance matrix.

type: string

Alternative to matrix: a compressed CEL files archive such as often found in GEO

type: string

Use SOFT files from GEO by providing the GSE study identifier

type: string

Column in the sample sheet to be used as the primary sample identifier

required

type: string

default: sample

Type of observation

required

type: string

default: sample

Column in the sample sheet to be used as identifier for observations. If unset, the –observations_id_col is used.

type: string

Options related to features

Feature ID attribute in the abundance table as well as in the GTF file (e.g. the gene_id field)

required

type: string

default: gene_id

Feature name attribute in the abundance table as well as in the GTF file (e.g. the gene symbol field)

required

type: string

default: gene_name

Type of feature. Often ‘gene’

required

type: string

default: gene

When set, use the control features in scaling/ normalization (currently only supported for differential_method deseq2)

type: boolean

A text file listing technical features (e.g. spikes)

type: string

Comma-separated string, specifies feature metadata columns to be used for exploratory analysis, platform-specific

type: string

default: gene_id,gene_name,gene_biotype

Supply your own feature annotations. Can be derived from the GTF (rnaseq) or from the Bioconductor annotation package (affy arrays).

type: string

pattern: ^\S+\.(csv|tsv)$

Analysis options related to the use of paramsheet to run multiple combinations of analyses (see usage docs for details).

Name of the paramset to run. In profile mode, set by the analysis profile for output directory naming. In paramsheet mode, selects which paramset(s) to run (comma-separated).

type: string

Path to a paramsheet YAML file. Setting this activates multi-run (paramsheet) mode where paramsheet values take priority over CLI flags.

type: string

pattern: ^\S+\.(yaml|yml)$

Options for processing of affy arrays with justRMA()

Column of the sample sheet containing the Affymetrix CEL file name

type: string

default: file

logical value. If set to true, apply background correction using RMA.

type: boolean

default: true

integer value indicating which RMA background to use

type: integer

default: 2

logical value. If TRUE, then works on the PM matrix in place as much as possible, good for large datasets.

type: boolean

Used to specify the name of an alternative cdf package. If set to NULL, then the usual cdf package based on Affymetrix’ mappings will be used.

type: string

logical value. If TRUE, a matrix of probe annotations will be derived.

type: boolean

default: true

should the spots marked as ‘MASKS’ set to NA?

type: boolean

should the spots marked as ‘OUTLIERS’ set to NA?

type: boolean

if TRUE, then overrides what is in rm.mask and rm.oultiers.

type: boolean

Genome annotation file in GTF format

type: string

pattern: ^\S+\.gtf(\.gz)?

If a GTF file is supplied, which feature type to use

type: string

default: transcript

If a GTF file is supplied, which field should go first in the converted output table

type: string

default: gene_id

Options for processing of proteomics MaxQuant tables with the Proteus R package

Prefix of the column names of the MaxQuant proteingroups table in which the intensity values are saved; the prefix has to be followed by the sample names that are also found in the samplesheet. Default: ‘LFQ intensity’; will search for both the prefix as entered and the prefix followed by one whitespace.

type: string

default: LFQ intensity

Normalization function to use on the MaxQuant intensities.

type: string

Which method to use for plotting sample distributions of the MaxQuant intensities; one of ‘violin’, ‘dist’, ‘box’.

type: string

Should a loess line be added to the plot of mean-variance relationship of the conditions? Default: true.

type: boolean

default: true

Valid R palette name

type: string

default: Set1

Options related to filtering upstream of differential analysis

Minimum abundance value. Set to false to disable abundance filtering.

required

type: integer,boolean

Minimum observations that must pass the threshold to retain the row/ feature (e.g. gene).

type: number

default: 1

A minimum proportion of observations, given as a number between 0 and 1, that must pass the threshold. Overrides minimum_samples

type: number

An optional grouping variable to be used to calculate a min_samples value

type: string

A minimum proportion of observations, given as a number between 0 and 1, that must have a value (not NA) to retain the row/ feature (e.g. gene).

type: number

default: 0.5

Minimum observations that must have a value (not NA) to retain the row/ feature (e.g. gene). Overrides filtering_min_proportion_not_na.

type: number

Options related to data exploration

Clustering method used in dendrogram creation

required

type: string

default: ward.D2

Correlation method used in dendrogram creation

required

type: string

default: spearman

Number of features selected before certain exploratory analyses. If -1, will use all features.

required

type: integer

default: 500

Length of the whiskers in boxplots as multiple of IQR. Defaults to 1.5.

type: number

default: 1.5

Threshold on MAD score for outlier identification

type: integer

default: -5

How should the main grouping variable be selected? ‘auto_pca’, ‘contrasts’, or a valid column name from the observations table.

required

type: string

default: auto_pca

Specifies assay names to be used for matrices, platform-specific.

hidden

type: string

default: raw,normalised,variance_stabilised

Specifies final assay to be used for exploratory analysis, platform-specific

hidden

type: string

default: variance_stabilised

Of which assays to compute the log2 during exploratory analysis. Not necessary for maxquant data as this is controlled by the pipeline.

type: string

default: raw,normalised

Valid R palette name

required

type: string

default: Set1

Options related to differential operations

Differential analysis method

type: string

Advanced option: the suffix associated tabular differential results tables. Will by default use the appropriate suffix according to the study_type.

type: string

The feature identifier column in differential results tables

required

type: string

default: gene_id

Minimum fold change used to calculate differential feature numbers. Note that this number will be log2 transformed

required

type: number

default: 2

Maximum p value used to calculate differential feature numbers

required

type: number

default: 1

Maximum q value used to calculate differential feature numbers

required

type: number

default: 0.05

Valid R palette name

required

type: string

default: Set1

In differential analysis (DEseq2 or Limma), subset to the contrast samples before modelling variance?

type: boolean

test parameter passed to DESeq()

type: string

fitType parameter passed to DESeq()

type: string

sfType parameter passed to DESeq()

type: string

‘minReplicatesForReplace’ parameter passed to DESeq()

type: integer

default: 7

useT parameter passed to DESeq2

type: boolean

independentFiltering parameter passed to results()

type: boolean

default: true

lfcThreshold parameter passed to results()

type: number

altHypothesis parameter passed to results()

type: string

default: greaterAbs

pAdjustMethod parameter passed to results()

type: string

default: BH

alpha parameter passed to results()

type: number

default: 0.1

minmu parameter passed to results()

type: number

default: 0.5

variance stabilisation method to use when making a variance stabilised matrix

type: string

Shrink fold changes in results?

type: boolean

default: true

blind parameter for rlog() and/ or vst()

type: boolean

default: true

nsub parameter passed to vst()

type: integer

default: 1000

passed to lmFit(), positive integer giving the number of times each distinct probe is printed on each array.

type: number

passed to lmFit(), positive integer giving the spacing between duplicate occurrences of the same probe, spacing=1 for consecutive rows.

type: string

Sample sheet column to be used to derive a vector or factor specifying a blocking variable on the arrays for limma::lmFit(); however, for random effects models, DREAM is the recommended approach in this pipeline

type: string

passed to limma::lmFit(), the inter-duplicate or inter-technical replicate correlation; however for random effects models, DREAM is the recommended approach in this pipeline

type: string

passed to lmFit(), the fitting method

type: string

passed to eBayes(), a numeric value between 0 and 1, assumed proportion of genes which are differentially expressed

type: number

default: 0.01

passed to eBayes(), logical, should an intensity-dependent trend be allowed for the prior variance?

type: boolean

passed to eBayes(), logical, should the estimation of df.prior and var.prior be robustified against outlier sample variances?

type: boolean

passed to eBayes, comma separated string of two values, assumed lower and upper limits for the standard deviation of log2-fold-changes for differentially expressed genes

type: string

default: 0.1,4

passed to eBayes, comma separated string of length 1 or 2, giving left and right tail proportions of x to Winsorize. Used only when robust=TRUE.

type: string

default: 0.05,0.1

passed to topTable(), minimum absolute log2-fold-change required

type: integer

passed to topTable(), logical, should confidence 95% intervals be output for logFC? Alternatively, can take a numeric value between zero and one specifying the confidence level required.

type: boolean

passed to topTable(), method used to adjust the p-values for multiple testing.

type: string

cutoff value for adjusted p-values. Only genes with lower p-values are listed.

type: number

default: 1

Turns on and off usage of voom normalization in the Limma module.

type: boolean

type: integer

default: 1

type: integer

passed to variancePartition::dream(), logical, should 95% confidence intervals be output for logFC? Alternatively, can take a numeric value between zero and one specifying the confidence level required.

type: boolean

passed to variancePartition::dream() (via eBayes), assumed proportion of genes that are differentially expressed (numeric 0-1).

type: number

default: 0.01

passed to variancePartition::dream() (via eBayes), comma-separated numeric pair giving the lower and upper limits of the standard deviation of the logFC values used in the prior.

type: string

default: 0.1,4

passed to variancePartition::dream() (via eBayes), logical, should an intensity-dependent trend be allowed for the prior variance?

type: boolean

passed to variancePartition::dream() (via eBayes), logical, should the estimation of df.prior and var.prior be robustified against outlier sample variances?

type: boolean

passed to variancePartition::dream() (via eBayes), comma-separated numeric pair giving the proportion of the lower and upper tails to be winsorized before estimating the standard deviation.

type: string

default: 0.05,0.1

Method used to estimate effective degrees of freedom for hypothesis testing in the linear mixed model. Allowed values: adaptive (default), Satterthwaite, Kenward-Roger.

type: string

passed to variancePartition::dream(), logical, use REML estimation for mixed-model variance components (passed through to lmer()).

type: boolean

Turns on and off usage of voomWithDreamWeights() normalization in the DREAM module.

type: boolean

Method used to adjust p-values for multiple testing (passed to p.adjust).

type: string

Alpha value for the Box-Cox transformation used by propd. Leave unset to skip the transformation.

type: number

Use moderated theta values in propd.

type: boolean

default: true

False discovery rate threshold used to define significantly proportional pairs.

type: number

default: 0.05

Number of permutations for FDR estimation. 0 uses the analytical FDR from F-statistic p-values.

type: integer

Number of theta cutoffs evaluated when estimating FDR by permutation.

type: integer

default: 100

Save the table of significant pairwise statistics.

type: boolean

Save the full table of pairwise statistics (very large).

type: boolean

Save the gene-by-gene adjacency matrix. Must be set to true when pairing propd with --functional_method grea, which consumes the adjacency.

type: boolean

Save the propd object as an RDS file.

type: boolean

Functional analysis method. Set to ‘none’ (default) to disable functional analysis.

type: string

Gene sets in GMT or GMX-format; for GSEA: multiple comma-separated input files in either format are possible. For gprofiler2: A single file in GMT format is possible; this has lowest priority and will be overridden by –gprofiler2_token and –gprofiler2_organism.

type: string

Permutation type

type: string

Number of permutations

type: integer

default: 1000

Enrichment statistic

type: string

Metric for ranking genes

type: string

Gene list sorting mode

type: string

Gene list ordering mode

type: string

Max size: exclude larger sets

type: integer

default: 500

Min size: exclude smaller sets

type: integer

default: 15

Normalisation mode

type: string

Randomization mode

type: string

Make detailed geneset report?

type: boolean

default: true

Use median for class metrics

type: boolean

Number of markers

type: integer

default: 100

Plot graphs for the top sets of each phenotype

type: integer

default: 20

Save random ranked lists

type: boolean

Make a zipped file with all reports

type: boolean

Short name of the organism that is analyzed, e.g. hsapiens for homo sapiens.

type: string

Should only significant enrichment results be considered?

type: boolean

default: true

Should underrepresentation be measured instead of overrepresentation?

type: boolean

The method that should be used for multiple testing correction.

type: string

On which source databases to run the gprofiler query

type: string

Whether to include evcodes in the results.

type: boolean

Maximum q value used for significance testing.

type: number

default: 0.05

Token that should be used as a query.

type: string

Path to CSV/TSV/TXT file that should be used as a background list of genes for the query; alternatively, ‘auto’ (default) or ‘false’.

type: string

default: auto

pattern: ^\S+\.(csv|tsv|txt)$|auto|false

Which column to use as gene IDs in the background matrix.

type: string

How to calculate the statistical domain size.

type: string

How many genes must be differentially expressed in a pathway for it to be considered enriched? Default 1.

type: integer

default: 1

Valid R palette name

type: string

default: Blues

Path to TSV file containing network file for decoupler

type: string

pattern: ^\S+\.(tsv)$

Removes sources of a net with less than min_n targets

type: integer

default: 5

Comma-separated list of methods to use (e.g., ‘ora,ulm’)

type: string

default: ulm

Minimum number of features in a gene set.

type: integer

default: 15

Maximum number of features in a gene set.

type: integer

default: 500

Number of permutations used by grea to estimate enrichment significance.

type: integer

default: 100

Should a Shiny app be built?

type: boolean

default: true

Should the app be deployed to shinyapps.io?

type: boolean

Your shinyapps.io account name

type: string

The name of the app to push to in your shinyapps.io account

type: string

Qmd/Rmd/ipynb report template(s) from which to create the pipeline report. Supply a single path or a comma-separated list of paths to render multiple reports per paramset.

required

type: string

default: ${projectDir}/assets/differentialabundance_report.qmd

pattern: ^[^,\s]+\.(Rmd|qmd|ipynb)(\s*,\s*[^,\s]+\.(Rmd|qmd|ipynb))*$

Email address for completion summary.

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

A logo to display in the report instead of the generic pipeline logo.

hidden required

type: string

default: ${projectDir}/docs/images/nf-core-differentialabundance_logo_light.png

CSS to use to style the output, in lieu of the default nf-core styling

hidden required

type: string

default: ${projectDir}/assets/nf-core_style.css

A markdown file containing citations to include in the final report

type: string

default: ${projectDir}/CITATIONS.md

A title for reporting outputs

type: string

An author for reporting outputs

type: string

Semicolon-separated string of contributor info that should be listed in the report.

type: string

A description for reporting outputs

type: string

Skip generation of reports

type: boolean

Comma-separated list of sections that should not be included in the final Quarto report

hidden

type: string

Reference genome related files and options required for the workflow.

Name of iGenomes reference.

type: string

Do not load the iGenomes reference config.

hidden

type: boolean

The base path to the igenomes reference files

hidden

type: string

default: s3://ngi-igenomes/igenomes/

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden

type: string

default: master

Base directory for Institutional configs.

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/configs/master

Institutional config name.

hidden

type: string

Institutional config description.

hidden

type: string

Institutional config contact information.

hidden

type: string

Institutional config URL link.

hidden

type: string

Less common options for the pipeline, typically set in a config file.

Display version and exit.

hidden

type: boolean

Method used to save pipeline results to output directory.

type: string

Email address for completion summary, only when pipeline fails.

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Send plain-text email instead of HTML.

hidden

type: boolean

Do not use coloured log outputs.

hidden

type: boolean

Boolean whether to validate parameters against the schema at runtime

type: boolean

default: true

Base URL or local path to location of pipeline test dataset files

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/test-datasets/

Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.

hidden

type: string

Display the help message.

type: boolean,string

Display the full detailed help message.

type: boolean

Display hidden parameters in the help message (only works when –help or –help_full are provided).

type: boolean

nf-core/differentialabundance