Define where the pipeline should find input data and save output data.

Path to comma-separated file containing information about the samples in the experiment.

required
type: string
pattern: ^\S+\.csv$

You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row.

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

required
type: string

Most common options used for the pipeline.

Path to reference fasta file.

type: string

Reference genome name.

required
type: string

List of tools for running the pipeline.

type: string
default: mobster,viber,pyclone-vi,sparsesignatures,sigprofiler

Flag for filtering or not QC mutations.

type: boolean
default: true

Method used to save pipeline results to output directory.

type: string
default: copy

Variant Annotation parameters.

Parameter for downloading VEP cache.

type: string

Path to VEP cache.

type: string

VEP cache version.

type: string

VEP species.

type: string

VEP reference genome name.

type: string

Add an extra custom argument to VEP.

type: string
default: --everything --filter_common --per_gene --total_length --offline --format vcf

Driver Annotation parameters.

Path to driver table.

type: string
default: https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/tumourevo/data/DRIVER_ANNOTATION/ANNOTATE_DRIVER/Compendium_Cancer_Genes.tsv

Filtering parameters from vcf file.

Flag for filtering mutations from vcf.

type: string
default: TRUE

CNAqc tool parameters.

For clonal simple CNAs, the list of segments to test.

type: string
default: c(\'1:0\', \'1:1\', \'2:0\', \'2:1\', \'2:2\')

By default LOH regions (A, AA), diploid regions (AB), and amplification regions (AAB, AABB) are tested, corresponding to '1:0', '1:1', '2:1', '2:0', '2:2' in "Major

" notation.

For clonal simple CNAs, a filter for the segments to test.

type: integer

The segment size is defined based on the number of mutations mapped, this cut is on the proportion relative to the whole set of segments one wishes to analyse (defined by karyotypes). For example, by setting min_karyotype_size = 0.2 one would QC clonal simple CNAs that contain at least 20 The default of this parameter is 0 (all QCed).

For clonal simple CNAs, as min_karyotype_size but with a cut measured on absolute mutation counts.

type: integer
default: 100

For example, by setting min_absolute_karyotype_mutations = 150 one would QC clonal simple CNAs that contain at least 150 mutations. The default of this parameter is 100.

For clonal simple CNAs, peaks detected will be filtered if, in a peak, we map less than p_binsize_peaks * N mutations.

type: number
default: 0.005

The value N is obtained couting all mutations that map in all peaks. By default this parameters is 0.005.

Deprecated parameter.

type: string
default: NULL

For clonal simple CNAs, the purity error tolerance to determine QC pass or fail.

type: number
default: 0.05

This can be set automatically using function auto_tolerance to optimise the analysis based on a desired rate of false positives matches, as a function of the data coverage and (putative) purity.

For clonal simple CNAs, a tolerance in comparing bands overlaps which is applied to the raw VAF values.

type: number
default: 0.015

For clonal simple CNAs, the number of times peak detection is bootstrapped (by default 1).

type: integer
default: 1

This helps sometimes finding peaks that might be visually observable but fail to be detected by the underlying peak-detection heuristics.

For KDE-based matches the adjust density parameter; see density.

type: integer
default: 1

For clonal simple CNAs, if "closest" the closest peak will be used to match the expected peak. If "rightmost" peaks are matched prioritizing right to left peaks (the higher-VAF gets matched first); this strategy is more correct in principle but works only if there are no spurious peaks in the estimated density.

type: string
default: rightmost

Deprecated parameter.

type: string
default: TRUE

For subclonal simple CNAs, the starting state to determine linear versus branching evolutionary models.

type: string
default: 1:1

For subclonal simple CNAs, the starting state to determine linear versus branching evolutionary models.

type: string
default: FALSE

Minimum number of mutations that are required to be mapped to a karyotype in order to compute CCF values (default 25).

type: integer
default: 25

For the entropy-based method, percentage of mutations that can be not-assigned (NA) in a karyotype.

type: number
default: 0.1

If the karyotype has more than cutoff_QC_PASS percentage of non-assigned mutations, then the overall set of CCFs is failed for the karyotype.

Either "ENTROPY" (default) or "ROUGH", to reflect the two different algorithms to compute CCF.

type: string
default: ENTROPY
type: string
default: absolute

joinCNAqc

If TRUE the mutations flagged as FAILED by CNAqc are discarded while building the joinCNAqc segmentation, if FALSE they are kept in the new object.

type: string
default: FALSE

If TRUE the original CNAqc object is kept in the joinCNAqc object, otherwise it is lost.

type: string
default: TRUE

The probability density used to model the read count data. Choices are beta-binomial and binomial.

type: string
default: beta-binomial

binomial is a common choice for sequencing data. beta-binomial is useful when the data is over-dispersed which has been observed frequently in sequencing data.

Number of random restarts of variational inference.

type: integer
default: 100

More restarts will have a higher probability of finding the optimal variational approximation. This also increases running time.

Number of grid points used for approximating the posterior distribution.

type: integer
default: 100

Higher values should be used for deeply sequenced data.

The number of clusters to use while fitting.

type: integer
default: 20

This should be set to a value larger than the expected number of clusters. The software will then automatically determine how many to use. In general this value should increase if as more samples are used.

A vector with the number of Beta components to use. All values of K must be positive and strictly greater than 0.

type: string
default: 1:5

Initial values for the parameters of the model. Can be either "random" or "peaks" .

type: string
default: peaks

Number of fits that should be attempted for each configuration of the model tested.

type: integer
default: 5

Boolean value whether to use or not tail mutations for subclonal deconvolution.

type: string
default: c(TRUE,FALSE)

Tolerance for convergency estimation.

type: number
default: 1e-10

Maximum number of steps for a fit.

type: integer
default: 250

A string that determines the type of fit. Can be either "MLE" , for the Maximum Likelihood Estimate of the Beta parameters, or "MM"for the Moment Matching.

type: string
default: MM

Seed for the random numbers generator

type: integer
default: 12345

Score to minimize to select the best model; this has to be one of 'reICL', 'ICL', 'BIC', 'AIC' or 'NLL'.

type: string
default: reICL

Boolean value whether to return the trace of model fit.

type: string
default: FALSE

Boolean value whether to run the fit in parallel.

type: string
default: TRUE

The minimum mixing proportion of a cluster to be returned as output.

type: number
default: 0.02

The minimum number of mutations assigned to a cluster to be returned as output.

type: integer
default: 10
type: string
default: FALSE

Overrides all the parameters with a predefined set of values, in order to implement different analyses.

type: string
default: NULL

The maximum number of clusters returned

type: integer
default: 10

The number of fits to be computed.

type: integer
default: 10

The concentration parameter of the Dirichlet mixture.

type: number
default: 0.000001

The prior Beta hyperparameter for each Binomial component a

type: integer
default: 1

The prior Beta hyperparameter for each Binomial component b

type: integer
default: 1

The maximum number of fit iterations

type: integer
default: 5000

The epsilon to measure convergence as ELBO absolute difference

type: number
default: 1e-10

Initialization of the q-distribution to compute the approximation of the posterior distributions.

type: string
default: prior

This can be set in three different waysL equal to the prior (q_init = 'prior'), via kmeans clustering (q_init = 'kmeans') and capturing points which are private to each dimension (q_init = 'private').

Boolean value whether to return the trace of model fit.

type: string
default: FALSE

The minimum Binomial success probability when applying a heuristic procedure to filter clusters after Variational Inference.

type: number
default: 0.05

The minimum size of the mixture component when applying a heuristic procedure to filter clusters after Variational Inference.

type: number
default: 0.02

Boolean value whether point assigned to a cluster that is filtered our, are re-assigned from the density function.

type: string
default: FALSE

The minimum number of dimensions where we want to detect a Binomial component when applying a heuristic procedure to filter clusters after Variational Inference.

type: integer
default: 1

If there are less than this number of tree available, all the structures are examined in an exhaustive fashion. Otherwise, if there are more than this, a Monte Carlo sampler is used.

type: integer
default: 10000

f a Monte Carlo sampler is used, n.sampling distinct trees are sampled and scored.

type: integer
default: 5000

When a number of trees are generated, scored and ranked, a maximum of store.max are returned to the user (these are selected following the ranking).

type: integer
default: 100

The number of signatures (min. value = 2) to be fit to the dataset, including the background signature.

type: string
default: 2:10

A numeric vector of length 96 provided by the user. The parameter is ignored if beta is given instead. If NULL, it is estimated through NMF.

type: string
default: NULL

The initial value of the signature matrix β. If NULL, it is estimated with a few runs of NMF. It must include the background signature as its first row.

type: string
default: NULL

If TRUE normalize the count matrix x row-wise before processing it. Useful for algorithm stability, when considerably different total counts of mutations are observed among the patients.

type: string
default: TRUE

The number of iterations of every single run of NMF LASSO.

type: integer
default: 30

Number of iterations to estimate the length(K) matrices beta (including the background signature) in case the argument beta is NULL. Ignored if beta is given.

type: integer
default: 10

The number of sub-iterations involved in the sparsification phase, within a full NMF LASSO iteration.

type: integer
default: 10000

The number of requested NMF worker subprocesses to spawn. If Inf, an adaptive maximum number is automatically chosen. If NA or NULL, the function is run as a single process.

type: string
default: all

The cross-validation test size, i.e., the percentage of entries set to zero during NMF and used for validation.

type: number
default: 0.01

The number of repetitions of the cross-validation procedure.

type: integer
default: 50

The number of randomized restarts of a single cross-validation repetition, in case of poor fits.

type: integer
default: 5

The candidate values of the sparsity parameter for the signature matrix 'beta' whose goodness of fit is assessed by cross-validation.

type: string
default: c(0.01, 0.05, 0.1, 0.2)

The candidate values of the sparsity parameter for the exposure-matrix entries alpha whose goodness of fit is assessed by cross-validation.

type: integer

If TRUE, informative messages are printed on the R console over the execution.

type: string
default: TRUE

Seed for the random number generation. To be set for reproducibility.

type: integer
default: 12345

The candidate values of the sparsity parameter for the exposure-matrix entries alpha whose goodness of fit is assessed by cross-validation.

type: string
default: c(0.00, 0.01, 0.05, 0.10)

Mode of publishing the SigProfiler genome.

type: string
default: move

Specify True if the reference genome should be downloaded.

type: boolean
default: true

Specify the path to the reference genome (if downloaded by the user), e.g. path/to/genome/tsb

type: string

Downsamples mutational matrices to the exome regions of the genome.

type: string

'matrix' is used for table format inputs using a tab separated file.

type: string
default: matrix

The minimum number of signatures to be extracted.

type: integer
default: 1

The maximum number of signatures to be extracted.

type: integer
default: 25

Mutation context name(s), separated by commas (,), that define the mutational contexts for signature extraction. In the default value, 96 represents the SBS96 context, DINUC represents the dinucleotide context, and ID represents the indel context.

type: string
default: 96,DINUC,ID

The number of iteration to be performed to extract each number signature.

type: integer
default: 100

If True, add poisson noise to samples by resampling.

type: string
default: true

Method of normalizing the genome matrix before it is analyzed by NMF. Options are, 'log2', 'custom' or 'none'.

type: string
default: gmm

The initialization algorithm for W and H matrix of NMF. Options are 'random', 'nndsvd', 'nndsvda', 'nndsvdar' and 'nndsvd_min'.

type: string
default: random

Value defines the minimum number of iterations to be completed before NMF converges.

type: integer
default: 10000

Value defines the maximum number of iterations to be completed before NMF converges .

type: integer
default: 1000000

Value defines the number number of iterations to done between checking next convergence .

type: integer
default: 10000

Ensures reproducible NMF replicate resamples. Provide the path to the Seeds.txt file (found in the results folder from a previous analysis) to reproduce results.

type: string
default: random

The cutoff thresh-hold of the average stability (default: 0.8). Solutions with average stabilities below this thresh-hold will not be considered.

type: number
default: 0.8

The number of processors to be used to extract the signatures (default: all processors).

type: integer
default: -1

The cutoff thresh-hold of the minimum stability (default: 0.2). Solutions with minimum stabilities below this thresh-hold will not be considered.

type: number
default: 0.2

The cutoff thresh-hold of the combined stability (sum of average and minimum stability) (default: 1.0). Solutions with combined stabilities below this thresh-hold will not be considered.

type: integer
default: 1

Generate de novo to COSMIC signature decomposition plots as part of the results (default: True). Set to False to skip generating these plots.

type: string
default: true

If True, SBS288 and SBS1536 de novo signatures will be mapped to SBS96 reference signatures (default: True). If False, those will be mapped to reference signatures of the same context.

type: string
default: true

Defines the version of the COSMIC reference signatures (default: 3.4). Takes a positive float among 1, 2, 3, 3.1, 3.2, 3.3, and 3.4.

type: number
default: 3.4

Write to output Ws and Hs from all the NMF iterations.

type: string
default: true

Create the probability matrix.

type: string
default: true

Outputs chromosome-based matrices.

type: string

Downsamples mutational matrices to custom regions of the genome. Requires the full path to the BED file.

type: string
default: None

Integrates with SigProfilerPlotting to output all available visualizations for each matrix.

type: string

Ouputs original mutations into a text file that contains the SigProfilerMatrixGenerator classificaiton for each mutation.

type: string

Values should be single or double.

type: string
default: single

Adds an Xbp cushion to the exome/bed_file ranges for downsampling the mutations.

type: integer
default: 100

Defines if the GPU resource will used if available. If True, the GPU resources will be used in the computation. Note: All available CPU processors are used by default, which may cause a memory error. This error can be resolved by reducing the number of CPU processes through the cpu parameter.

type: string

Outputs the results of a transcriptional strand bias test for the respective matrices.

type: string

Will be effective only if the GPU is used. Defines the number of NMF replicates to be performed by each CPU during the parallel processing. Note: For batch_size values greater than 1, each NMF replicate will update until max_nmf_iterations is reached.

type: integer
default: 1

Defines if solutions with a drop in stability with respect to the highest stable number of signatures will be considered.

type: string

Value defines the tolerance to achieve to converge.

type: number
default: 1e-15

A number that represent the normal contamination level for which the sample is considered passed or failed.

type: integer
default: 3

A range [x, y] so that only mutations with VAF in that range are actually used to determine the TIN/ TIT levels of the input.

type: string
default: c(0, 0.7)

An upper bound on the VAF of a cluster in the tumour data. Clusters above this value will be considered miscalled clonal clusters (e.g., due to LOH etc.).

type: number
default: 0.6

Consider only latent variables with responsibilities above this cutoff.

type: number
default: 0.75

If there are more than N mutations in VAF range VAF_range_tumour, a random subset of size N is retained

type: integer
default: 20000

If TRUE, it runs the analysis with reduced sampling power and accuracy. Use this to obtain a result for preliminary inspection of your data, and then run autofit with this parameter set to FALSE.

type: string
default: TRUE

Email address for completion summary.

type: boolean

Email address for completion summary, only when pipeline fails.

type: boolean

Do not use coloured log outputs.

type: boolean

Send plain-text email instead of HTML.

type: boolean

Git commit id for Institutional configs.

type: string
default: master

Base directory for Institutional configs.

type: string
default: https://raw.githubusercontent.com/nf-core/configs/master

Institutional config description.

type: boolean

Institutional config contact information.

type: boolean
type: string
default: s3://ngi-igenomes/igenomes/

Institutional config URL link.

type: boolean
type: boolean
type: string
default: null/pipeline_info
type: boolean
type: boolean
type: string
type: string
default: https://raw.githubusercontent.com/nf-core/test-datasets/
type: boolean
default: true