nf-core/hadge
Comprehensive pipeline for donor demultiplexing in single cell
Define where the pipeline should find input data and save output data.
Path to comma-separated file containing information about the samples in the experiment.
string^\S+\.csv$Mode of the pipeline.
stringTools used for hash demultiplexing.
stringgmm-demuxTools used for genetic demultiplexing.
stringvireoPerform BAM QC.
booleantrueFile with common variants. If provided, the BAM files will be filtered to only include reads that overlap with the common variants.
string^\S+\.vcf(\.gz)?$The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
stringSave intermediate files.
booleanEmail address for completion summary.
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$MultiQC report title. Printed as page header, used for filename if not otherwise specified.
stringReference genome related files and options required for the workflow.
Name of iGenomes reference.
stringPath to FASTA genome file.
string^\S+\.fn?a(sta)?(\.gz)?$Do not load the iGenomes reference config.
booleanThe base path to the igenomes reference files
strings3://ngi-igenomes/igenomes/Options for matching donors between different demultiplexing methods.
Match donor between different demultiplexing methods.
booleantruePath to demultiplexing result CSV file (necessary only for donor_match mode).
string,nullPath to Vireo filtered variants file (necessary only for donor_match mode).
string,nullPath to cell genotype file (necessary only for donor_match mode).
string,nullFirst method to use for donor matching.
string,nullSecond method to use for donor matching.
string,nullFind variants for donor matching.
booleantrueOption to subset the donor genotype based on detected variants.
booleantrueMinimum variant count threshold.
integer10The Minimal percentage of a variant for filtering. Has to be in a range between [0,5;1[. For example, 0.9 means that we only keep variants with a frequency higher than 90% or lower than 10%.
number0.9Path to cell genotype file (necessary only for donor_match mode).
string,nullOptions specific to the demuxEM tool for cell hashing demultiplexing.
Generate diagnostic plots.
booleantrueThe Dirichlet prior concentration parameter (alpha) on samples. An alpha value < 1.0 will make the prior sparse.
numberOnly demultiplex cells/nuclei with at least this number of expressed genes.
integer100Only demultiplex cells/nuclei with at least this number of UMIs.
integer100Any cell/nucleus with less than this count of hashtags from the signal will be marked as unknown.
number10The random seed used in the KMeans algorithm to separate empty ADT droplets from others.
integerComma-separated list of gender-specific genes (e.g. Xist) for generating violin plots.
stringOptions specific to the BFF cell hashing demultiplexing.
Method(s) to use within BFF.
stringCOMBINEDWhether to run preprocessing steps for BFF.
booleantruePath to barcode whitelist for preprocessing.
string,nullPath to cell barcode whitelist for GenerateCellHashingCalls().
string,nullMethods to use for consensus calling.
string,nullOptional metrics file path.
string,nullWhether to compute tSNE visualization in BFF.
booleanWhether to generate heatmaps in BFF.
booleantruePer-cell saturation value.
number,nullMajority consensus threshold.
number,nullLibrary chemistry (e.g., 10xV3).
string10xV3Threshold for caller disagreement.
number,nullOptions specific to the GMM-Demux tool for cell hashing demultiplexing.
Comma separated list of HTO names, without whitespace. If null, hto_names are extracted from the input hto matrix from features.tsv.gz.
string,nullIf specified, it will generate the statistic summary of the dataset, including MSM and SSM rates. This requires an estimated total number of cells in the assay as input.
integer,nullIf true, full classification report is generated, otherwise the simplified classification report.
booleantrueIf true, summary report is generated.
booleantrueLoad a full classification report and skip the mtx folder as input. Requires a file path argument.
string,nullProvide the cell list. Requires a file path argument. Only executes if -u is set.
string,nullNames of the HTO tag(s) to extract, separated by ’,’. Joint HTO samples are combined with ’+’, such as ‘HTO_1+HTO_2’.
stringThe confidence threshold value for classification. A higher value leads to more stringent classification.
number0.8The random seed used in the GaussianMixture algorithm.
integerOptions specific to the Scanpy Hashsolo demultiplexing module.
Groovy list ([‘hash_1’, ‘hash_2’]) of .obs columns that contain cell hashing counts. Can be null if the data is in 10x Genomics format, as the columns are derived from the input.
array,nullList of comma-separated priors for each hypothesis: NEGATIVE, SINGLET, DOUBLET.
string0.01,0.8,0.19Column in cell_hashing_adata.obs for how to break up demultiplexing.
string,nullInput directory containing transcriptomic data in 10x mtx format.
string,nullNumber of barcodes to use to create noise distribution.
integer,nullNumber of decimal places to round numeric values in cell_hashing_data.obs before saving. If omitted, no rounding is applied.
integer10Options specific to the HTODemux tool for cell hashing demultiplexing.
The quantile to use for thresholding.
number0.99Initialization method for clustering.
stringNULLNumber of starts for clustering.
integer100Clustering function to use.
stringclaraNumber of samples for clustering.
integer100Random seed for reproducibility.
integer42Whether to print verbose output.
booleantrueOptions specific to the HTODemux visualization tool for generating plots and visualizations.
Generate ridge plot.
booleantrueThe number of plots that are dispalyed next to each other in one row. The number of plots corresponds to the number of Hash Tag Oligo (HTO) identifiers.
integer2Generate feature scatter plot. If no features are provided (one of them is null), the first two features from the assay will be used.
booleantrueName of a Hash Tag Oligo (HTO) identifiers, usually defined in the feature.tsv of the hto matrix folder.
stringName of a Hash Tag Oligo (HTO) identifiers, usually defined in the feature.tsv of the hto matrix folder.
stringGenerate violin plot.
booleantrueFeatures to plot (gene expression, metrics, PC scores, anything that can be retrieved by FetchData).
stringnCount_RNAPlot the feature axis on log scale.
booleantrueGenerate a two dimensional tSNE embedding for HTOs.
booleantrueWhat should we remove from the object (we have Singlet, Doublet and Negative).
stringInvert tSNE selection.
booleantrueVerbose tSNE.
booleanApproximate tSNE.
booleanMax number of donors.
integer2Value for perplexity.
integer100Generate heatmap.
booleantrueNumber of cells for heatmap.
integer500Options specific to the MultiSeqDemux tool for cell hashing demultiplexing.
The quantile to use for thresholding.
number0.7Whether to automatically determine thresholds.
booleantrueMaximum number of iterations.
integer5Start of quantile range.
number0.1End of quantile range.
number0.9Step size for quantile range.
number0.05Whether to print verbose output.
booleantrueOptions specific to the HashedDrops tool for cell hashing demultiplexing.
Lower bound on total UMI count for empty droplets.
integer100Number of iterations for Monte Carlo p-value calculations.
integer10000Whether to test ambient RNA.
booleantrueWhether to round non-integer values.
booleantrueAlternative method for identifying empty droplets.
integer,nullFDR threshold for cell filtering.
number0.01Column to use for gene names.
integer2Lower bound for ignoring barcodes.
number,nullScaling parameter for Dirichlet-multinomial sampling.
number,nullWhether to use ambient solution abundance.
booleantrueMinimum proportion for ambient profile inference.
number0.05Minimum pseudo-count for log-fold change computation.
integer5Whether to use constant ambient contamination level.
booleanNumber of MADs to identify doublets.
integer3Minimum threshold for doublet identification.
integer2Whether to use 2-component mixture model for doublets.
booleanNumber of MADs to identify confident singlets.
integer3Minimum threshold for confident singlet identification.
integer2An integer matrix specifying valid combinations of HTOs. Number of items in each row has to be the same.
arrayWhether to run EmptyDrops analysis as part of HashedDrops.
booleanOptions for preprocessing data for HTODemux and MultiSeq demultiplexing.
Method for feature selection.
stringmean.var.plotDelimiter for parsing feature names.
string_Number of features to select.
integer2000Assay type for preprocessing.
stringHTOMargin parameter for preprocessing.
integer2Normalization method to use.
stringCLRColumn containing gene information.
integer2Options controlling hash summary and downstream exported formats.
Generate AnnData (.h5ad) outputs for hashing results.
booleantrueGenerate MuData outputs for hashing results.
booleantrueOptions specific to the CellSNP-lite tool for genotyping bi-allelic SNPs on single cells.
Tag for cell barcodes, e.g., CB for 10x Genomics. Set to ‘None’ for bulk RNA-seq or SMART-seq2.
stringCBTag for UMI barcodes, e.g., UB for 10x Genomics. Set to ‘None’ for bulk RNA-seq or SMART-seq2 without UMIs.
stringAutoMinimum aggregated count (across cells) for SNPs to be included in the output.
integer20Minimum minor allele frequency (MAF) for SNPs to be included in the output.
numberRequired flags in SAM/BAM: skip reads that don’t have ALL of these flags. See SAM format specification for details.
stringExcluding flags in SAM/BAM: skip reads that have ANY of these flags. See SAM format specification for details.
stringMinimum read length (after clipping) for a read to be included.
integer30Minimum mapping quality for a read to be included.
integer20Maximum read depth at a position per input file. Set to 0 for highest possible value.
integerIf true, do not skip anomalous read pairs (i.e., count orphan reads).
booleanOptions specific to the Vireo tool for donor demultiplexing from single-cell RNA-seq data.
The tag for donor genotype in VCF file. Options: GT, GP, PL.
stringIf true, do not check for doublets during demultiplexing.
booleanNumber of random initializations when GT needs to be learned.
integer50Number of extra donors in pre-cluster, when GT needs to be learned.
integerMethod for searching from extra donors. ‘size’: n_cell per donor; ‘distance’: GT distance between donors.
stringIf true, treat donor GT as prior only and learn genotypes from data.
booleantrueIf true, turn on SNP specific allelic ratio (ASE mode).
booleanIf true, turn off plotting GT distance.
booleanRandom seed for initialization.
integerRange of cells to process, e.g., ‘0-10000’. Default is ‘all’.
stringallIf true, detect ambient RNAs in each cell (experimental feature).
booleanOptions specific to the DSC-Pileup tool for pileup generation from single-cell BAM files.
Tag representing readgroup or cell barcodes to partition the BAM file into multiple groups. For 10x Genomics, use CB.
stringCBTag representing UMIs. For 10x Genomics, use UB.
stringUBMaximum base quality (higher BQ will be capped).
integer40Minimum base quality to consider (lower BQ will be skipped).
integer13Minimum mapping quality to consider (lower MQ will be ignored).
integer20Minimum distance to the tail (lower will be ignored).
integerSAM/BAM FLAGs to be excluded.
integer3844Minimum number of total reads for a droplet/cell to be considered.
integerMinimum number of unique reads (determined by UMI/SNP pair) for a droplet/cell to be considered.
integerMinimum number of SNPs with coverage for a droplet/cell to be considered.
integerOptions specific to the Demuxlet tool for genotype-based demultiplexing of single-cell RNA-seq data.
FORMAT field to extract the genotype, likelihood, or posterior from.
stringGTOffset of genotype error rate. [error] = [offset] + [1-offset][coeff][1-r2]
number0.1Slope of genotype error rate. [error] = [offset] + [1-offset][coeff][1-r2]
numberINFO field name representing R2 value. Used for representing imputation quality.
stringR2Minimum minor allele frequency.
integer1Minimum call rate.
number0.5Grid of alpha to search for.
string0.1,0.2,0.3,0.4,0.5Prior probability of doublet.
number0.5Options specific to the Freemuxlet tool for reference-free genotype-based demultiplexing.
Prior probability of doublet.
number0.5Genotype error parameter per cluster.
number0.1Bayes Factor Threshold used in the initial clustering.
number5.41Fraction of droplets to be clustered in the very first round of initial clustering procedure.
number1Iteration for initial cluster assignment (set to zero to skip the iterations).
integer10Keep missing cluster assignment as missing in the initial iteration.
booleanRandomize the singlet scores to test its effect.
booleanSeed for random number (use clocks if not set).
integerOptions specific to the Souporcell tool for clustering mixed-genotype scRNAseq experiments by individual.
Ploidy, must be 1 or 2.
integerMin alt to use locus.
integer10Min ref to use locus.
integer10Max loci per cell, affects speed.
integer2048Number of restarts in clustering, when there are > 12 clusters we recommend increasing this to avoid local minima.
integer100Common variant loci or known variant loci vcf, must be vs same reference fasta.
string^\S+\.vcf(\.gz)?$Known variants per clone in population vcf mode, must be .vcf right now we dont accept gzip or bcf sorry.
string^\S+\.vcf$Which samples in population vcf from known genotypes option represent the donors in your sample. Provide space-separated sample names for multiple donors.
stringDon’t remap with minimap2 (not recommended unless in conjunction with —common_variants).
booleanSet to True to ignore data error assertions.
booleanParameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
stringmasterBase directory for Institutional configs.
stringhttps://raw.githubusercontent.com/nf-core/configs/masterInstitutional config name.
stringInstitutional config description.
stringInstitutional config contact information.
stringInstitutional config URL link.
stringLess common options for the pipeline, typically set in a config file.
Display version and exit.
booleanMethod used to save pipeline results to output directory.
stringEmail address for completion summary, only when pipeline fails.
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$Send plain-text email instead of HTML.
booleanFile size limit when attaching MultiQC reports to summary emails.
string25.MB^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$Do not use coloured log outputs.
booleanIncoming hook URL for messaging service
stringCustom config file to supply to MultiQC.
stringCustom logo file to supply to MultiQC. File name must also be set in the MultiQC config file
stringCustom MultiQC yaml file containing HTML including a methods description.
stringBoolean whether to validate parameters against the schema at runtime
booleantrueBase URL or local path to location of pipeline test dataset files
stringhttps://raw.githubusercontent.com/nf-core/test-datasets/Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.
stringDisplay the help message.
boolean,stringDisplay the full detailed help message.
booleanDisplay hidden parameters in the help message (only works when —help or —help_full are provided).
boolean