differentialabundance: Parameters

Define where the pipeline should find input data and save output data.

Path to comma-separated file containing information about the samples in the experiment.

required

type: string

pattern: ^\S+\.(csv|tsv|txt)$

You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row. See usage docs.

A CSV file describing sample contrasts

required

type: string

pattern: ^\S+\.csv$

This file is used to define groups of samples from 'input' to compare. It must contain at least the columns 'variable', 'reference', 'target' and 'blocking', where 'variable' is a column in the input sample sheet, 'reference' and 'target' are values in that column, and blocking is a colon-separated list of additional 'blocking' variables (can be an empty string)

TSV-format abundance matrix

required

type: string

pattern: ^\S+\.(tsv|csv|txt)$

For example an expression matrix output from the nf-core/rnaseq workflow. There must be a column in this matrix for every row in the input sample sheet.

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

required

type: string

A string to identify results in the output directory

required

type: string

default: study

Also used as an identifier in some processes

A string identifying the technology used to produce the data

required

type: string

Currently only 'rnaseq' may be specified. In future options like 'affy_array' are possible.

Email address for completion summary.

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (~/.nextflow/config) then you don't need to specify this on the command line for every run.

A text file listing technical features (e.g. spikes)

type: string

One feature per row. Note that by default these features will just be stripped from matrices prior to internal processing. To actually use them in e.g. normalisation, set --sizefactors_from_controls

When set, use the control features in scaling/ normalisation

type: boolean

Use supplied control features in normalistion/ scaling operations?

Rmd report template from which to create the pipeline report

required

type: string

pattern: ^\S+\.Rmd$

A logo to display in the report instead of the generic pipeline logo

required

type: string

default: docs/images/nf-core-differentialabundance_logo_light.png

CSS to use to style the output, in lieu of the default nf-core styling

required

type: string

default: assets/nf-core_style.css

A markdown file containing citations to include in the fiinal report

type: string

default: CITATIONS.md

Column in the samples sheet to be used as the primary sample identifier

required

type: string

default: sample

Type of observation

required

type: string

default: sample

This is used in reporting to refer to the observations. Frequently this is 'sample' (e.g. in RNA-seq experiments), but it may also be desirable to refer to 'pool', or 'individual'.

Options related to features

Feature ID attribute in the GTF file (e.g. the gene_id field)

required

type: string

default: gene_id

Feature name attribute in the GTF file (e.g. the gene symbol field)

required

type: string

default: gene_name

Type of feature we have, often 'gene'

required

type: string

default: gene

Options related to filtering upstream of differential analysis

Minimum abundance value

required

type: integer

default: 1

Minimum observations that must pass the threshold to retain the row/ feature (e.g. gene).

type: number

default: 1

A minimum proportion of observations, given as a number between 0 and 1, that must pass the threshold. Overrides minimum_samples

type: number

An optional grouping variable to be used to calculate a min_samples value

type: string

The variable can be used to define groups and derive a minimum group size upon which to base minimum observation numbers. The rationale for this is to allow retention of features that might be present in only one group. Note that this is consciously NOT filtering with an explicit awareness of groups ("feature must be present in all samples of group A"), since this is known to create biases towards discovery of differential features.

Options related to data exploration

Clustering method used in dendrogram creation

required

type: string

default: ward.D2

Correlation method used in dendrogram creation

required

type: string

default: spearman

Number of features selected before certain exploratory analyses

required

type: integer

default: 500

Length of the whiskers in boxplots as multiple of IQR. Defaults to 1.5.

type: number

default: 1.5

Threshold on MAD score for outlier identification

type: integer

default: -5

MAD = median absolute deviation. A threshold on this value is used to define observations (samples) as outliers, or not, in exploratory plots. Based on the definition at https://wiki.arrayserver.com/wiki/index.php?title=CorrelationQC.pdf.

How should the main grouping variable be selected? 'auto_pca', 'contrasts', or a valid column name from the observations table.

required

type: string

default: auto_pca

Some plots are only generated once, with a single sample grouping, this option defines how that sample grouping is selected. It should be 'auto_pca' (variable selected from the sample sheet with the most association with the first principal component), 'contrasts' (pick the variable associated with the first contrast), or a value specifying a specific column in the observations.

Options related to differential operations

The suffix associated tabular differential results tables

required

type: string

default: .deseq2.results.tsv

The feature identifier column in differential results tables

required

type: string

default: gene_id

The fold change column in differential results tables

required

type: string

default: log2FoldChange

The p value column in differential results tables

type: string

default: pvalue

The q value column in differential results tables.

required

type: string

default: padj

Minimum fold change used to calculate differential feature numbers

required

type: integer

default: 2

Maximum q value used to calculate differential feature numbrers

required

type: number

default: 0.05

Where a features file (GTF) has been provided, what attributed to use to name features

type: string

default: gene_name

Indicate whether or not fold changes are on the log scale (default is to assume they are)

type: boolean

default: true

test parameter passed to DESeq()

type: string

either "Wald" or "LRT", which will then use either Wald significance tests (defined by nbinomWaldTest), or the likelihood ratio test on the difference in deviance between a full and reduced model formula (defined by nbinomLRT)

fitType parameter passed to DESeq()

type: string

either "parametric", "local", "mean", or "glmGamPoi" for the type of fitting of dispersions to the mean intensity. See estimateDispersions for description.

sfType parameter passed to DESeq()

type: string

either "ratio", "poscounts", or "iterate" for the type of size factor estimation. See estimateSizeFactors for description.

'minReplicatesForReplace' parameter passed to DESeq()

type: integer

default: 7

the minimum number of replicates required in order to use replaceOutliers on a sample. If there are samples with so many replicates, the model will be refit after these replacing outliers, flagged by Cook's distance. Set to Inf in order to never replace outliers. It set to Inf for fitType="glmGamPoi".

useT parameter passed to DESeq2

type: boolean

logical, passed to nbinomWaldTest, default is FALSE, where Wald statistics are assumed to follow a standard Normal

independentFiltering parameter passed to results()

type: boolean

default: true

logical, whether independent filtering should be applied automatically

lfcThreshold parameter passed to results()

type: integer

a non-negative value which specifies a log2 fold change threshold. The default value is 0, corresponding to a test that the log2 fold changes are equal to zero. The user can specify the alternative hypothesis using the altHypothesis argument, which defaults to testing for log2 fold changes greater in absolute value than a given threshold. If lfcThreshold is specified, the results are for Wald tests, and LRT p-values will be overwritten.

altHypothesis parameter passed to results()

type: string

default: greaterAbs

character which specifies the alternative hypothesis, i.e. those values of log2 fold change which the user is interested in finding. The complement of this set of values is the null hypothesis which will be tested. If the log2 fold change specified by 'name' or by contrast' is written as beta , then the possible values for 'altHypothesis' represent the following alternate hypotheses: 1) greaterAbs: |beta| > lfcThreshold , and p-values are two-tailed 2) lessAbs: |beta| < lfcThreshold , p-values are the maximum of the upper and lower tests. The Wald statistic given is positive, an SE-scaled distance from the closest boundary 3) greater: beta > lfcThreshold 4) less: beta < -lfcThreshold

pAdjustMethod parameter passed to results()

type: string

default: BH

the method to use for adjusting p-values, see help in R for the p.adjust() function (via ?p.adjust). At time of writing available values were "holm", "hochberg", "hommel", "bonferroni", "BH", "BY", "fdr", "none".

alpha parameter passed to results()

type: number

default: 0.1

the significance cutoff used for optimizing the independent filtering (by default 0.1). If the adjusted p-value cutoff (FDR) will be a value other than 0.1, alpha should be set to that value.

minmu parameter passed to results()

type: number

default: 0.5

lower bound on the estimated count (used when calculating contrasts)

variance stabilisation method to use when making a variance stabilised matrix

type: string

'rlog', 'vst' or 'rlog,vst'

Shink fold changes in results?

type: boolean

default: true

'ashr' method is the only method currently implemented

Number of cores

type: integer

default: 1

Number of cores to use with DESeq()

blind parameter for rlog() and/ or vst()

type: boolean

default: true

logical, whether to blind the transformation to the experimental design

nsub parameter passed to vst()

type: integer

default: 1000

the number of genes to subset to (default 1000)

Set to run GSEA to infer differential gene sets in contrasts

type: boolean

Permutation type

type: string

Select the type of permutation to perform in assessing the statistical significance of the enrichment score. (See 'required fields' at https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideFrame.html?Run_GSEA_Page for more info)

Number of permutations

type: integer

default: 1000

Specify the number of permutations to perform in assessing the statistical significance of the enrichment score. (See 'required fields' at https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideFrame.html?Run_GSEA_Page)

Enrichment statistic

type: string

See 'basic fields' at https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideFrame.html?Run_GSEA_Page for a detailed explanation.

Metric for ranking genes

type: string

See https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideTEXT.htm#_Metrics_for_Ranking for a detailed explanation.

Gene list sorting mode

type: string

GSEA ranks the genes in the expression dataset and then analyzes that ranked list of genes. Use this parameter to determine whether to sort the genes using the real (default) or absolute value of the ranking metric.

See 'basic fields' at https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideFrame.html?Run_GSEA_Page

Gene list ordering mode

type: string

GSEA ranks the genes in the expression dataset and then analyzes that ranked list of genes. Use this parameter to determine whether to sort the genes in descending (default) or ascending order. Ascending order is usually applicable when the ranking metric is a measure of nearness (how close the genes are to one another) rather than distance.

See 'basic fields' at https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideFrame.html?Run_GSEA_Page

Max size: exclude larger sets

type: integer

default: 500

After filtering from the gene sets any gene not in the expression dataset, gene sets larger than this are excluded from the analysis.

See 'basic fields' at https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideFrame.html?Run_GSEA_Page

Min size: exclude smaller sets

type: integer

default: 15

After filtering from the gene sets any gene not in the expression dataset, gene sets smaller than this are excluded from the analysis.

See 'basic fields' at https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideFrame.html?Run_GSEA_Page

Normalisation mode

type: string

Normalization mode. Method used to normalize the enrichment scores across analyzed gene sets: 'meandiv' (default, GSEA normalizes the enrichment scores as described in Normalized Enrichment Score (NES)) OR 'none' (GSEA does not normalize the enrichment scores).

See 'advanced fields' at https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideFrame.html?Run_GSEA_Page

Randomization mode

type: string

Method used to randomly assign phenotype labels to samples for phenotype permutations. Not used for gene_set permutations.

See 'advanced fields' at https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideFrame.html?Run_GSEA_Page

Make detailed geneset report?

type: boolean

default: true

Use median for class metrics

type: boolean

Set to true (default=false) to use the median of each class, instead of the mean, in the metrics for ranking for genes

See 'advanced fields' at https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideFrame.html?Run_GSEA_Page.

Number of markers

type: integer

default: 100

Number of features (gene or probes) to include in the butterfly plot in the Gene Markers section of the gene set enrichment report.

See 'advanced fields' at https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideFrame.html?Run_GSEA_Page.

Plot graphs for the top sets of each phenotype

type: integer

default: 20

Generates summary plots and detailed analysis results for the top x genes in each phenotype, where x is 20 by default. The top genes are those with the largest normalized enrichment scores.

See 'advanced fields' at https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideFrame.html?Run_GSEA_Page.

Seed for permutation

type: string

default: timestamp

Seed used to generate a random number for phenotype and gene_set permutations: timestamp (default), 149, or user input. The specific seed value (149) generates consistent results, which is useful when testing software.

Save random ranked lists

type: boolean

Set to 'true' (default=false) to save the random ranked lists of genes created by phenotype permutations. When you save random ranked lists, for each permutation, GSEA saves the rank metric score for each gene (the score used to position the gene in the ranked list). Saving random ranked lists is memory intensive; therefore, this parameter is set to false by default.

Make a zipped file with all reports

type: boolean

Set to True (default=false) to create a zip file of the analysis results. The zip file is saved to the output folder with all of the other files generated by the analysis. This is useful for sharing analysis results

Gene sets in GMT or GMX-format

type: string

default: None

Reference genome related files and options required for the workflow.

Name of iGenomes reference.

type: string

If using a reference genome configured in the pipeline using iGenomes, use this parameter to give the ID for the reference. This is then used to build the full paths for all required reference genome files e.g. --genome GRCh38.

See the nf-core website docs for more details.

Genome annotation file in GTF format

type: string

pattern: ^\S+\.gtf(\.gz)?

"This parameter is mandatory if --genome is not specified."

Directory / URL base for iGenomes references.

hidden

type: string

default: s3://ngi-igenomes/igenomes

Do not load the iGenomes reference config.

hidden

type: boolean

Do not load igenomes.config when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in igenomes.config.

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden

type: string

default: master

Base directory for Institutional configs.

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/configs/master

If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.

Institutional config name.

hidden

type: string

Institutional config description.

hidden

type: string

Institutional config contact information.

hidden

type: string

Institutional config URL link.

hidden

type: string

Set the top limit for requested resources for any single job.

Maximum number of CPUs that can be requested for any single job.

hidden

type: integer

default: 16

Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. --max_cpus 1

Maximum amount of memory that can be requested for any single job.

hidden

type: string

default: 128.GB

pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. --max_memory '8.GB'

Maximum amount of time that can be requested for any single job.

hidden

type: string

default: 240.h

pattern: ^(\d+\.?\s*(s|m|h|day)\s*)+$

Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. --max_time '2.h'

Less common options for the pipeline, typically set in a config file.

Display help text.

type: boolean

Display version and exit.

hidden

type: boolean

Method used to save pipeline results to output directory.

type: string

The Nextflow publishDir option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.

Email address for completion summary, only when pipeline fails.

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.

Send plain-text email instead of HTML.

type: boolean

Do not use coloured log outputs.

type: boolean

Incoming hook URL for messaging service

hidden

type: string

Incoming hook URL for messaging service. Currently, MS Teams and Slack are supported.

Directory to keep pipeline Nextflow logs and reports.

type: string

default: ${params.outdir}/pipeline_info

Boolean whether to validate parameters against the schema at runtime

type: boolean

default: true

Show all params when using --help

type: boolean

By default, parameters set as hidden in the schema are not shown on the command line when a user runs with --help. Specifying this option will tell the pipeline to show all parameters.

nf-core/differentialabundance