Define where the pipeline should find input data and save output data.

The .tsv file specifying sample matrix filepaths.

required
type: string
default: ./refs/Manifest.txt

The .tsv file specifying sample metadata.

required
type: string
default: ./refs/SampleSheet.tsv

Optional tsv file containing mappings between ensembl_gene_id's and gene_names's

required
type: string
default: https://raw.githubusercontent.com/nf-core/test-datasets/scflow/assets/ensembl_mappings.tsv

Cell-type annotations reference file path

required
type: string
default: https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/28033407/ctd_v1.zip

This is a zip file containing cell-type annotation reference files for the EWCE package.

Optional tsv file specifying manual revisions of cell-type annotations.

type: string
default: ./conf/celltype_mappings.tsv

Optional list of genes of interest in YML format for plotting of gene expression.

type: string
default: ./conf/reddim_genes.yml

Input sample species.

required
type: string
default: human

Currently, "human" and "mouse" are supported.

Outputs directory.

type: string
default: ./results

Parameters for quality-control and thresholding.

The sample sheet column name with unique sample identifiers.

required
type: string
default: manifest

The sample sheet variables to treat as factors.

required
type: string
default: seqdate

All sample sheet columns with numbers which should be treated as factors should be specified here separated by commas. Examples include columns with dates, numeric sample identifiers, etc.

Minimum library size (counts) per cell.

required
type: integer
default: 250

Maximum library size (counts) per cell.

required
type: string
default: adaptive

Minimum features (expressive genes) per cell.

required
type: integer
default: 100

Maximum features (expressive genes) per cell.

required
type: string
default: adaptive

Minimum proportion of counts mapping to ribosomal genes.

type: number

Maximum proportion of counts mapping to ribosomal genes.

required
type: number
default: 1

Maximum proportion of counts mapping to mitochondrial genes.

required
type: string
default: adaptive

Minimum counts for gene expressivity.

required
type: integer
default: 2

Expressive genes must have >=min_counts in >=min_cells

Minimum cells for gene expressivity.

required
type: integer
default: 2

Expressive genes must have >=min_counts in >=min_cells

Option to drop unmapped genes.

required
type: string
default: True

Option to drop mitochondrial genes.

required
type: string
default: True

Option to drop ribosomal genes.

required
type: string
default: false

The number of MADs for outlier detection.

required
type: number
default: 4

The number of median absolute deviations (MADs) used to define outliers for adaptive thresholding.

Options for profiling ambient RNA/empty droplets.

Enable ambient RNA / empty droplet profiling.

required
type: string
default: true

Upper UMI counts threshold for true cell annotation.

required
type: string
default: auto

A numeric scalar specifying the threshold for the total UMI count above which all barcodes are assumed to contain cells, or "auto" for automated estimation based on the data.

This parameter must be a combination of the following values: d, auto

Lower UMI counts threshold for empty droplet annotation.

required
type: integer
default: 100

A numeric scalar specifying the lower bound on the total UMI count, at or below which all barcodes are assumed to correspond to empty droplets.

The maximum FDR for the emptyDrops algorithm.

required
type: number
default: 0.001

Number of Monte Carlo p-value iterations.

required
type: integer
default: 10000

An integer scalar specifying the number of iterations to use for the Monte Carlo p-value calculations for the emptyDrops algorithm.

Expected number of cells per sample.

required
type: integer
default: 3000

If the "retain" parameter is set to "auto" (recommended), then this parameter is used to identify the optimal value for "retain" for the emptyDrops algorithm.

Parameters for identifying singlets/doublets/multiplets.

Enable doublet/multiplet identification.

required
type: string
default: true

Algorithm to use for doublet/multiplet identification.

required
type: string
default: doubletfinder

Variables to regress out for dimensionality reduction.

required
type: string
default: nCount_RNA,pc_mito

Number of PCA dimensions to use.

required
type: integer
default: 10

The top n most variable features to use.

required
type: integer
default: 2000

A fixed doublet rate.

type: number

Use a fixed default rate (e.g. 0.075 to specify that 7.5% of all cells should be marked as doublets), or set to 0 to use the "dpk" method (recommended).

Doublets per thousand cells increment.

required
type: integer
default: 8

The doublets per thousand cell increment specifies the expected doublet rate based on the number of cells, i.e. with a dpk of 8 (recommended by 10X), a dataset with 1000 cells is expected to contain 8 doublets per thousand cells, a dataset with 2000 cells is expected to contain 16 doublets per thousand cells, and a dataset with 10000 cells is expected to contain 80 cells per thousand cells (or 800 doublets in total). If the "doublet_rate" parameter is manually specified this recommended incremental behaviour is overridden.

Specify a pK value instead of parameter sweep.

required
type: number
default: 0.02

The optimal pK value used by the doubletFinder algorithm is determined following a compute-intensive parameter sweep. The parameter sweep can be overridden by manually specifying a pK value.

Parameters used in the merged quality-control report.

Numeric variables for inter-sample metrics.

required
type: string
default: total_features_by_counts,total_counts,pc_mito,pc_ribo

A comma-separated list of numeric variables which differ between individual cells of each sample. The merged sample report will include plots facilitating between-sample comparisons for each of these numeric variables.

Categorical variables for further sub-setting of plots

required
type: string
default: NULL

A comma-separated list of categorical variables. The merged sample report will include additional plots of sample metrics subset by each of these variables (e.g. sex, diagnosis).

Numeric variables for outlier identification.

required
type: string
default: total_features_by_counts,total_counts

The merged report will include tables highlighting samples that are putative outliers for each of these numeric variables.

Parameters for integrating datasets and batch correction.

Choice of integration method.

required
type: string
default: Liger

Unique sample identifier variable.

required
type: string
default: manifest

Fill out matrices with union of genes.

required
type: string
default: false

See rliger::createLiger(). Whether to fill out raw.data matrices with union of genes across all datasets (filling in 0 for missing data) (requires make.sparse = TRUE) (default FALSE).

Remove non-expressing cells/genes.

required
type: string
default: true

See rliger::createLiger(). Whether to remove cells not expressing any measured genes, and genes not expressed in any cells (if take.gene.union = TRUE, removes only genes not expressed in any dataset) (default TRUE).

Number of genes to find for each dataset.

required
type: integer
default: 3000

See rliger::selectGenes(). Number of genes to find for each dataset. Optimises the value of var.thresh for each dataset to get this number of genes.

How to combine variable genes across experiments.

required
type: string
default: union

See rliger::selectGenes(). Either "union" or "intersection".

Keep unique genes.

required
type: string
default: false

See rliger::selectGenes().

Capitalize gene names to match homologous genes.

required
type: string
default: false

See rliger::selectGenes().

Treat each column as a cell.

required
type: string
default: true

See rliger::removeMissingObs().

Inner dimension of factorization (n factors).

required
type: integer
default: 30

See rliger::optimizeALS(). Inner dimension of factorization (number of factors). Run suggestK to determine appropriate value; a general rule of thumb is that a higher k will be needed for datasets with more sub-structure.

Regularization parameter.

required
type: number
default: 5

See rliger::optimizeALS(). Regularization parameter. Larger values penalize dataset-specific effects more strongly (ie. alignment should increase as lambda increases). Run suggestLambda to determine most appropriate value for balancing dataset alignment and agreement (default 5.0).

Convergence threshold.

required
type: number
default: 0.0001

See rliger::optimizeALS().

Maximum number of block coordinate descent iterations.

required
type: integer
default: 100

See rliger::optimizeALS().

Number of restarts to perform.

required
type: integer
default: 1

See rliger::optimizeALS().

Random seed for reproducible results.

required
type: integer
default: 1

Number of neearest neighbours for within-dataset knn graph.

required
type: integer
default: 20

See rliger::quantile_norm().

Horizon parameter for shared nearest factor graph.

required
type: integer
default: 500

See rliger::quantileAlignSNF(). Distances to all but the k2 nearest neighbors are set to 0 (cuts down on memory usage for very large graphs).

Minimum allowed edge weight.

required
type: number
default: 0.2

See rliger::quantileAlignSNF().

Name of dataset to use as a reference.

required
type: string
default: NULL

See rliger::quantile_norm(). Name of dataset to use as a "reference" for normalization. By default, the dataset with the largest number of cells is used.

Minimum number of cells to consider a cluster shared across datasets.

required
type: integer
default: 2

See rliger::quantile_norm().

Number of quantiles to use for normalization.

required
type: integer
default: 50

See rliger::quantile_norm().

Number of times to perform Louvain community detection.

required
type: integer
default: 10

See rliger::quantileAlignSNF(). Number of times to perform Louvain community detection with different random starts (default 10).

Controls the number of communities detected.

required
type: integer
default: 1

See rliger::quantileAlignSNF().

Indices of factors to use for shared nearest factor determination.

required
type: string
default: NULL

See rliger::quantile_norm().

Distance metric to use in calculating nearest neighbour.

required
type: string
default: CR

See rliger::quantileAlignSNF(). Default "CR".

Center the data when scaling factors.

required
type: string
default: false

See rliger::quantile_norm().

Small cluster extraction cells threshold.

type: integer

See rliger::quantileAlignSNF(). Extracts small clusters loading highly on single factor with fewer cells than this before regular alignment (default 0 – no small cluster extraction).

Categorical variables for integration report metrics.

required
type: string
default: individual,diagnosis,region,sex

The integration report will provide plots and integration metrics for these categorical variables.

Reduced dimension embedding for the integration report.

required
type: string
default: UMAP

The integration report will provide with and without integration plots using this embedding.

Settings for dimensionality reduction algorithms.

Input matrix for dimension reduction.

required
type: string
default: PCA,Liger

Dimension reduction outputs to generate.

required
type: string
default: tSNE,UMAP,UMAP3D

Typically 'UMAP,UMAP3D' or 'tSNE'.

Variables to regress out before dimension reduction.

required
type: string
default: nCount_RNA,pc_mito

Number of PCA dimensions.

required
type: integer
default: 30

See uwot::umap().

Number of nearest neighbours to use.

required
type: integer
default: 35

See uwot::umap().

The dimension of the space to embed into.

required
type: integer
default: 2

See uwot::umap(). The dimension of the space to embed into. This defaults to 2 to provide easy visualization, but can reasonably be set to any integer value in the range 2 to 100.

Type of initialization for the coordinates.

required
type: string

See uwot::umap().

Distance metric for finding nearest neighbours.

required
type: string

See uwot::umap().

Number of epochs to us during optimization of embedded coordinates.

required
type: integer
default: 200

See uwot::umap().

Initial learning rate used in optimization of coordinates.

required
type: integer
default: 1

See uwot::umap().

Effective minimum distance between embedded points.

required
type: number
default: 0.4

See uwot::umap(). Smaller values will result in a more clustered/clumped embedding where nearby points on the manifold are drawn closer together, while larger values will result on a more even dispersal of points. The value should be set relative to the spread value, which determines the scale at which embedded points will be spread out.

Effective scale of embedded points.

required
type: number
default: 0.85

See uwot::umap(). In combination with min_dist, this determines how clustered/clumped the embedded points are.

Interpolation to combine local fuzzy sets.

required
type: number
default: 1

See uwot::umap(). The value of this parameter should be between 0.0 and 1.0; a value of 1.0 will use a pure fuzzy union, while 0.0 will use a pure fuzzy intersection.

Local connectivity required.

required
type: integer
default: 1

See uwot::umap(). The local connectivity required – i.e. the number of nearest neighbors that should be assumed to be connected at a local level. The higher this value the more connected the manifold becomes locally.

Weighting applied to negative samples in embedding optimization.

required
type: integer
default: 1

See uwot::umap(). Weighting applied to negative samples in low dimensional embedding optimization. Values higher than one will result in greater weight being given to negative samples.

Number of negative edge samples to use per positive edge sample.

required
type: integer
default: 5

See uwot::umap(). The number of negative edge/1-simplex samples to use per positive edge/1-simplex sample in optimizing the low dimensional embedding.

Use fast SGD.

required
type: string
default: false

See uwot::umap(). Setting this to TRUE will speed up the stochastic optimization phase, but give a potentially less accurate embedding, and which will not be exactly reproducible even with a fixed seed. For visualization, fast_sgd = TRUE will give perfectly good results. For more generic dimensionality reduction, it's safer to leave fast_sgd = FALSE.

Output dimensionality.

required
type: integer
default: 2

See Rtsne::Rtsne().

Number of dimensions retained in the initial PCA step.

required
type: integer
default: 50

See Rtsne::Rtsne().

Perplexity parameter.

required
type: integer
default: 150

See Rtsne::Rtsne().

Speed/accuracy trade-off.

required
type: number
default: 0.5

See Rtsne::Rtsne(). Speed/accuracy trade-off (increase for less accuracy), set to 0.0 for exact TSNE (default: 0.5).

Iteration after which perplexities are no longer exaggerated.

required
type: integer
default: 250

See Rtsne::Rtsne(). Iteration after which the perplexities are no longer exaggerated (default: 250, except when Y_init is used, then 0).

Iteration after which the final momentum is used.

required
type: integer
default: 250

See Rtsne::Rtsne(). Iteration after which the final momentum is used (default: 250, except when Y_init is used, then 0).

Number of iterations.

required
type: integer
default: 1000

See Rtsne::Rtsne().

Center data before PCA.

required
type: string
default: true

See Rtsne::Rtsne(). Should data be centered before pca is applied? (default: TRUE)

Scale data before PCA.

required
type: string
default: false

See Rtsne::Rtsne(). Should data be scaled before pca is applied? (default: FALSE).

Normalize data before distance calculations.

required
type: string
default: true

See Rtsne::Rtsne(). Should data be normalized internally prior to distance calculations with normalize_input? (default: TRUE)

Momentum used in the first part of optimization.

required
type: number
default: 0.5

See Rtsne::Rtsne().

Momentum used in the final part of optimization.

required
type: number
default: 0.8

See Rtsne::Rtsne().

Learning rate.

required
type: integer
default: 1000

See Rtsne::Rtsne().

Exaggeration factor used in the first part of the optimization.

required
type: integer
default: 12

See Rtsne::Rtsne(). Exaggeration factor used to multiply the P matrix in the first part of the optimization (default: 12.0).

Parameters used to tune louvain/leiden clustering.

Clustering method.

required
type: string
default: leiden

Specify "leiden" or "louvain".

Reduced dimension input(s) for clustering.

required
type: string
default: UMAP_Liger

One or more of "UMAP", "tSNE", "PCA", "LSI".

The resolution of clustering.

required
type: number
default: 0.001

Integer number of nearest neighbours for clustering.

required
type: integer
default: 50

Integer number of nearest neighbors to use when creating the k nearest neighbor graph for Louvain/Leiden clustering. k is related to the resolution of the clustering result, a bigger k will result in lower resolution and vice versa.

The number of iterations for clustering.

required
type: integer
default: 1

Parameters used for cell-type annotation and the associated report.

SingleCellExperiment clusters colData variable name.

required
type: string
default: clusters

Max cells to sample.

required
type: integer
default: 10000

A sample metadata unique sample ID.

required
type: string
default: individual

SingleCellExperiment cell-type colData variable name.

required
type: string
default: cluster_celltype

Cell-type metrics for categorical variables.

required
type: string
default: manifest,diagnosis,sex,capdate,prepdate,seqdate

Cell-type metrics for numeric variables.

required
type: string
default: pc_mito,pc_ribo,total_counts,total_features_by_counts

Number of top marker genes for plot/table generation.

required
type: integer
default: 5

Parameters for differential gene expression.

Differential gene expression method.

required
type: string
default: MASTZLM

MAST method.

required
type: string

See MAST::zlm(). Either 'glm', 'glmer' or 'bayesglm'.

Expressive gene minimum counts.

required
type: integer
default: 1

Only genes with at least min_counts in min_cells_pc will be tested for differential gene expression.

Expressive gene minimum cells fraction.

required
type: number
default: 0.1

Only genes with at least min_counts in min_cells_pc will be tested for differential gene expression. Default 0.1 (i.e. 10% of cells).

Re-scale numeric covariates.

required
type: string
default: true

Re-scaling and centring numeric covariates in a model can improve model performance.

Pseudobulked differential gene expression.

required
type: string
default: false

Perform differential gene expression on a smaller matrix where counts are first summed across all cells within a sample (defined by dge_sample_var level).

Cell-type annotation variable name.

required
type: string
default: cluster_celltype

Differential gene expression is performed separately for each cell-type of this colData variable.

Unique sample identifier variable.

required
type: string
default: manifest

Dependent variable of DGE model.

required
type: string
default: group

The dependent variable may be a categorical (e.g. diagnosis) or a numeric (e.g. histopathology score) variable.

Reference class of categorical dependent variable.

required
type: string
default: Control

If a categorical dependent variable is specified, then the reference class of the dependent variable is specified here (e.g. 'Control').

Confounding variables.

required
type: string
default: cngeneson,seqdate,pc_mito

A comma-separated list of confounding variables to account for in the DGE model.

Random effect confounding variable.

required
type: string
default: NULL

If specified, the term + (1 | x ) +is added to the model, where x is the specified random effects variable.

Fold-change threshold for plotting.

required
type: number
default: 1.1

This absolute fold-change cut-off value is used in plots (e.g. volcano) and the DGE report.

Adjusted p-value cutoff.

required
type: number
default: 0.05

The adjusted p-value cutoff value is used in plots (e.g. volcano) and the DGE report.

Force model fit for non-full rank.

required
type: string
default: false

A non-full rank model specification will return an error; to override this to return a warning only, set to TRUE.

Maximum CPU cores.

required
type: string
default: 'null'

The default value of 'null' utilizes all available CPU cores. As each additional CPU core increases the number of genes simultaneously fit, the RAM/memory demand increases concomitantly. Manually overriding this parameter can reduce the memory demands of parallelization across multiple cores.

Parameters for impacted pathway analysis of differentially expressed genes.

Pathway enrichment tool(s) to use.

required
type: string

Enrichment method.

required
type: string
default: ORA

Database(s) to use for enrichment.

required
type: string
default: GO_Biological_Process

See scFlow::list_databases(). Name of the database(s) for enrichment. Examples include "GO_Biological_Process", "GO_Cellular_Component", "GO_Molecular_Function", "KEGG", "Reactome", "Wikipathway".

Parameters for dirichlet modeling of relative cell-type proportions.

Unique sampler identifier.

required
type: string
default: individual

Cell-type annotation variable name.

required
type: string
default: cluster_celltype

Dependent variable of Dirichlet model.

required
type: string
default: group

Reference class of categorical dependent variable.

required
type: string
default: Control

Dependent variable classes order.

required
type: string
default: Control,Low,High

For plotting and reports, the order of classes for the dependent variable can be manually specified (e.g. 'Control,Low,High').

General parameters for plotting.

Preferred embedding for plots.

required
type: string
default: UMAP_Liger

Point size for reduced dimension plots.

required
type: number
default: 0.1

To improve visualization the point size should be adjusted according to the total number of cells plotted.

Alpha (transparency) value for reduced dimension plots.

required
type: number
default: 0.2

To improve visualization the alpha (transparency) value should be adjusted according to the total number of cells plotted.

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden
type: string
default: master

Base directory for Institutional configs.

hidden
type: string
default: https://raw.githubusercontent.com/nf-core/configs/master

If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.

Institutional configs hostname.

hidden
type: string

Institutional config name.

hidden
type: string

Institutional config description.

hidden
type: string

Institutional config contact information.

hidden
type: string

Institutional config URL link.

hidden
type: string

Set the top limit for requested resources for any single job.

Maximum number of CPUs that can be requested for any single job.

hidden
type: integer
default: 16

Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. --max_cpus 1

Maximum amount of memory that can be requested for any single job.

hidden
type: string
default: 256.GB
pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. --max_memory '8.GB'

Maximum amount of time that can be requested for any single job.

hidden
type: string
default: 240.h
pattern: ^(\d+\.?\s*(s|m|h|day)\s*)+$

Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. --max_time '2.h'

Less common options for the pipeline, typically set in a config file.

Display help text.

hidden
type: boolean

Method used to save pipeline results to output directory.

hidden
type: string

The Nextflow publishDir option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.

Email address for completion summary, only when pipeline fails.

hidden
type: string
pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.

Do not use coloured log outputs.

hidden
type: boolean

Directory to keep pipeline Nextflow logs and reports.

hidden
type: string
default: ${params.outdir}/pipeline_info

Boolean whether to validate parameters against the schema at runtime

hidden
type: boolean
default: true

Show all params when using --help

hidden
type: boolean

By default, parameters set as hidden in the schema are not shown on the command line when a user runs with --help. Specifying this option will tell the pipeline to show all parameters.

Run this workflow with Conda. You can also use '-profile conda' instead of providing this parameter.

hidden
type: boolean

Instead of directly downloading Singularity images for use with Singularity, force the workflow to pull and convert Docker containers instead.

hidden
type: boolean

This may be useful for example if you are unable to directly pull Singularity containers to run the pipeline due to http/https proxy issues.

E-mail address for optional workflow completion notification.

hidden
type: string

Send plain-text email instead of HTML.

hidden
type: boolean

NA

hidden
type: string