tumourevo: Parameters

Define where the pipeline should find input data and save output data.

Path to comma-separated file containing information about the samples in the experiment.

type: string

pattern: ^\S+\.csv$

You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row.

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

type: string

Most common options used for the pipeline.

Path to reference fasta file.

type: string

Reference genome name.

type: string

List of tools for running the pipeline.

type: string

default: mobster,viber,pyclone-vi,sparsesignatures,sigprofiler

Flag for filtering or not QC mutations.

type: boolean

default: true

Method used to save pipeline results to output directory.

type: string

default: copy

Variant Annotation parameters.

Parameter for downloading VEP cache.

type: string

Path to VEP cache.

type: string

VEP cache version.

type: string

VEP species.

type: string

VEP reference genome name.

type: string

Add an extra custom argument to VEP.

type: string

default: --everything --filter_common --per_gene --total_length --offline --format vcf

Driver Annotation parameters.

Path to driver table.

type: string

default:

https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/tumourevo/data/DRIVER_ANNOTATION/ANNOTATE_DRIVER/Compendium_Cancer_Genes.tsv

Filtering parameters from vcf file.

Flag for filtering mutations from vcf.

type: string

default: TRUE

CNAqc tool parameters.

For clonal simple CNAs, the list of segments to test.

type: string

default: c(\'1:0\', \'1:1\', \'2:0\', \'2:1\', \'2:2\')

By default LOH regions (A, AA), diploid regions (AB), and amplification regions (AAB, AABB) are tested, corresponding to '1:0', '1:1', '2:1', '2:0', '2:2' in "Major:minor" notation.

For clonal simple CNAs, a filter for the segments to test.

type: integer

The segment size is defined based on the number of mutations mapped, this cut is on the proportion relative to the whole set of segments one wishes to analyse (defined by karyotypes). For example, by setting min_karyotype_size = 0.2 one would QC clonal simple CNAs that contain at least 20 The default of this parameter is 0 (all QCed).

For clonal simple CNAs, as min_karyotype_size but with a cut measured on absolute mutation counts.

type: integer

default: 100

For example, by setting min_absolute_karyotype_mutations = 150 one would QC clonal simple CNAs that contain at least 150 mutations. The default of this parameter is 100.

For clonal simple CNAs, peaks detected will be filtered if, in a peak, we map less than p_binsize_peaks * N mutations.

type: number

default: 0.005

The value N is obtained couting all mutations that map in all peaks. By default this parameters is 0.005.

Deprecated parameter.

type: string

default: NULL

For clonal simple CNAs, the purity error tolerance to determine QC pass or fail.

type: number

default: 0.05

This can be set automatically using function auto_tolerance to optimise the analysis based on a desired rate of false positives matches, as a function of the data coverage and (putative) purity.

For clonal simple CNAs, a tolerance in comparing bands overlaps which is applied to the raw VAF values.

type: number

default: 0.015

For clonal simple CNAs, the number of times peak detection is bootstrapped (by default 1).

type: integer

default: 1

This helps sometimes finding peaks that might be visually observable but fail to be detected by the underlying peak-detection heuristics.

For KDE-based matches the adjust density parameter; see density.

type: integer

default: 1

For clonal simple CNAs, if "closest" the closest peak will be used to match the expected peak. If "rightmost" peaks are matched prioritizing right to left peaks (the higher-VAF gets matched first); this strategy is more correct in principle but works only if there are no spurious peaks in the estimated density.

type: string

default: rightmost

Deprecated parameter.

type: string

default: TRUE

For subclonal simple CNAs, the starting state to determine linear versus branching evolutionary models.

type: string

default: 1:1

For subclonal simple CNAs, the starting state to determine linear versus branching evolutionary models.

type: string

default: FALSE

Minimum number of mutations that are required to be mapped to a karyotype in order to compute CCF values (default 25).

type: integer

default: 25

For the entropy-based method, percentage of mutations that can be not-assigned (NA) in a karyotype.

type: number

default: 0.1

If the karyotype has more than cutoff_QC_PASS percentage of non-assigned mutations, then the overall set of CCFs is failed for the karyotype.

Either "ENTROPY" (default) or "ROUGH", to reflect the two different algorithms to compute CCF.

type: string

default: ENTROPY

type: string

default: absolute

joinCNAqc

If TRUE the mutations flagged as FAILED by CNAqc are discarded while building the joinCNAqc segmentation, if FALSE they are kept in the new object.

type: string

default: FALSE

If TRUE the original CNAqc object is kept in the joinCNAqc object, otherwise it is lost.

type: string

default: TRUE

The probability density used to model the read count data. Choices are beta-binomial and binomial.

type: string

default: beta-binomial

binomial is a common choice for sequencing data. beta-binomial is useful when the data is over-dispersed which has been observed frequently in sequencing data.

Number of random restarts of variational inference.

type: integer

default: 100

More restarts will have a higher probability of finding the optimal variational approximation. This also increases running time.

Number of grid points used for approximating the posterior distribution.

type: integer

default: 100

Higher values should be used for deeply sequenced data.

The number of clusters to use while fitting.

type: integer

default: 20

This should be set to a value larger than the expected number of clusters. The software will then automatically determine how many to use. In general this value should increase if as more samples are used.

A vector with the number of Beta components to use. All values of K must be positive and strictly greater than 0.

type: string

default: 1:5

Initial values for the parameters of the model. Can be either "random" or "peaks" .

type: string

default: peaks

Number of fits that should be attempted for each configuration of the model tested.

type: integer

default: 5

Boolean value whether to use or not tail mutations for subclonal deconvolution.

type: string

default: c(TRUE,FALSE)

Tolerance for convergency estimation.

type: number

default: 1e-10

Maximum number of steps for a fit.

type: integer

default: 250

A string that determines the type of fit. Can be either "MLE" , for the Maximum Likelihood Estimate of the Beta parameters, or "MM"for the Moment Matching.

type: string

default: MM

Seed for the random numbers generator

type: integer

default: 12345

Score to minimize to select the best model; this has to be one of 'reICL', 'ICL', 'BIC', 'AIC' or 'NLL'.

type: string

default: reICL

Boolean value whether to return the trace of model fit.

type: string

default: FALSE

Boolean value whether to run the fit in parallel.

type: string

default: TRUE

The minimum mixing proportion of a cluster to be returned as output.

type: number

default: 0.02

The minimum number of mutations assigned to a cluster to be returned as output.

type: integer

default: 10

type: string

default: FALSE

Overrides all the parameters with a predefined set of values, in order to implement different analyses.

type: string

default: NULL

The maximum number of clusters returned

type: integer

default: 10

The number of fits to be computed.

type: integer

default: 10

The concentration parameter of the Dirichlet mixture.

type: number

default: 0.000001

The prior Beta hyperparameter for each Binomial component a

type: integer

default: 1

The prior Beta hyperparameter for each Binomial component b

type: integer

default: 1

The maximum number of fit iterations

type: integer

default: 5000

The epsilon to measure convergence as ELBO absolute difference

type: number

default: 1e-10

Initialization of the q-distribution to compute the approximation of the posterior distributions.

type: string

default: prior

This can be set in three different waysL equal to the prior (q_init = 'prior'), via kmeans clustering (q_init = 'kmeans') and capturing points which are private to each dimension (q_init = 'private').

Boolean value whether to return the trace of model fit.

type: string

default: FALSE

The minimum Binomial success probability when applying a heuristic procedure to filter clusters after Variational Inference.

type: number

default: 0.05

The minimum size of the mixture component when applying a heuristic procedure to filter clusters after Variational Inference.

type: number

default: 0.02

Boolean value whether point assigned to a cluster that is filtered our, are re-assigned from the density function.

type: string

default: FALSE

The minimum number of dimensions where we want to detect a Binomial component when applying a heuristic procedure to filter clusters after Variational Inference.

type: integer

default: 1

If there are less than this number of tree available, all the structures are examined in an exhaustive fashion. Otherwise, if there are more than this, a Monte Carlo sampler is used.

type: integer

default: 10000

f a Monte Carlo sampler is used, n.sampling distinct trees are sampled and scored.

type: integer

default: 5000

When a number of trees are generated, scored and ranked, a maximum of store.max are returned to the user (these are selected following the ranking).

type: integer

default: 100

The number of signatures (min. value = 2) to be fit to the dataset, including the background signature.

type: string

default: 2:10

A numeric vector of length 96 provided by the user. The parameter is ignored if beta is given instead. If NULL, it is estimated through NMF.

type: string

default: NULL

The initial value of the signature matrix β. If NULL, it is estimated with a few runs of NMF. It must include the background signature as its first row.

type: string

default: NULL

If TRUE normalize the count matrix x row-wise before processing it. Useful for algorithm stability, when considerably different total counts of mutations are observed among the patients.

type: string

default: TRUE

The number of iterations of every single run of NMF LASSO.

type: integer

default: 30

Number of iterations to estimate the length(K) matrices beta (including the background signature) in case the argument beta is NULL. Ignored if beta is given.

type: integer

default: 10

The number of sub-iterations involved in the sparsification phase, within a full NMF LASSO iteration.

type: integer

default: 10000

The number of requested NMF worker subprocesses to spawn. If Inf, an adaptive maximum number is automatically chosen. If NA or NULL, the function is run as a single process.

type: string

default: all

The cross-validation test size, i.e., the percentage of entries set to zero during NMF and used for validation.

type: number

default: 0.01

The number of repetitions of the cross-validation procedure.

type: integer

default: 50

The number of randomized restarts of a single cross-validation repetition, in case of poor fits.

type: integer

default: 5

The candidate values of the sparsity parameter for the signature matrix 'beta' whose goodness of fit is assessed by cross-validation.

type: string

default: c(0.01, 0.05, 0.1, 0.2)

The candidate values of the sparsity parameter for the exposure-matrix entries alpha whose goodness of fit is assessed by cross-validation.

type: integer

If TRUE, informative messages are printed on the R console over the execution.

type: string

default: TRUE

Seed for the random number generation. To be set for reproducibility.

type: integer

default: 12345

The candidate values of the sparsity parameter for the exposure-matrix entries alpha whose goodness of fit is assessed by cross-validation.

type: string

default: c(0.00, 0.01, 0.05, 0.10)

Mode of publishing the SigProfiler genome.

type: string

default: move

Specify True if the reference genome should be downloaded.

type: boolean

default: true

Specify the path to the reference genome (if downloaded by the user), e.g. path/to/genome/tsb

type: string

Downsamples mutational matrices to the exome regions of the genome.

type: string

'matrix' is used for table format inputs using a tab separated file.

type: string

default: matrix

The minimum number of signatures to be extracted.

type: integer

default: 1

The maximum number of signatures to be extracted.

type: integer

default: 25

Mutation context name(s), separated by commas (,), that define the mutational contexts for signature extraction. In the default value, 96 represents the SBS96 context, DINUC represents the dinucleotide context, and ID represents the indel context.

type: string

default: 96,DINUC,ID

The number of iteration to be performed to extract each number signature.

type: integer

default: 100

If True, add poisson noise to samples by resampling.

type: string

default: true

Method of normalizing the genome matrix before it is analyzed by NMF. Options are, 'log2', 'custom' or 'none'.

type: string

default: gmm

The initialization algorithm for W and H matrix of NMF. Options are 'random', 'nndsvd', 'nndsvda', 'nndsvdar' and 'nndsvd_min'.

type: string

default: random

Value defines the minimum number of iterations to be completed before NMF converges.

type: integer

default: 10000

Value defines the maximum number of iterations to be completed before NMF converges .

type: integer

default: 1000000

Value defines the number number of iterations to done between checking next convergence .

type: integer

default: 10000

Ensures reproducible NMF replicate resamples. Provide the path to the Seeds.txt file (found in the results folder from a previous analysis) to reproduce results.

type: string

default: random

The cutoff thresh-hold of the average stability (default: 0.8). Solutions with average stabilities below this thresh-hold will not be considered.

type: number

default: 0.8

The number of processors to be used to extract the signatures (default: all processors).

type: integer

default: -1

The cutoff thresh-hold of the minimum stability (default: 0.2). Solutions with minimum stabilities below this thresh-hold will not be considered.

type: number

default: 0.2

The cutoff thresh-hold of the combined stability (sum of average and minimum stability) (default: 1.0). Solutions with combined stabilities below this thresh-hold will not be considered.

type: integer

default: 1

Generate de novo to COSMIC signature decomposition plots as part of the results (default: True). Set to False to skip generating these plots.

type: string

default: true

If True, SBS288 and SBS1536 de novo signatures will be mapped to SBS96 reference signatures (default: True). If False, those will be mapped to reference signatures of the same context.

type: string

default: true

Defines the version of the COSMIC reference signatures (default: 3.4). Takes a positive float among 1, 2, 3, 3.1, 3.2, 3.3, and 3.4.

type: number

default: 3.4

Write to output Ws and Hs from all the NMF iterations.

type: string

default: true

Create the probability matrix.

type: string

default: true

Outputs chromosome-based matrices.

type: string

Downsamples mutational matrices to custom regions of the genome. Requires the full path to the BED file.

type: string

default: None

Integrates with SigProfilerPlotting to output all available visualizations for each matrix.

type: string

Ouputs original mutations into a text file that contains the SigProfilerMatrixGenerator classificaiton for each mutation.

type: string

Values should be single or double.

type: string

default: single

Adds an Xbp cushion to the exome/bed_file ranges for downsampling the mutations.

type: integer

default: 100

Defines if the GPU resource will used if available. If True, the GPU resources will be used in the computation. Note: All available CPU processors are used by default, which may cause a memory error. This error can be resolved by reducing the number of CPU processes through the cpu parameter.

type: string

Outputs the results of a transcriptional strand bias test for the respective matrices.

type: string

Will be effective only if the GPU is used. Defines the number of NMF replicates to be performed by each CPU during the parallel processing. Note: For batch_size values greater than 1, each NMF replicate will update until max_nmf_iterations is reached.

type: integer

default: 1

Defines if solutions with a drop in stability with respect to the highest stable number of signatures will be considered.

type: string

Value defines the tolerance to achieve to converge.

type: number

default: 1e-15

A number that represent the normal contamination level for which the sample is considered passed or failed.

type: integer

default: 3

A range [x, y] so that only mutations with VAF in that range are actually used to determine the TIN/ TIT levels of the input.

type: string

default: c(0, 0.7)

An upper bound on the VAF of a cluster in the tumour data. Clusters above this value will be considered miscalled clonal clusters (e.g., due to LOH etc.).

type: number

default: 0.6

Consider only latent variables with responsibilities above this cutoff.

type: number

default: 0.75

If there are more than N mutations in VAF range VAF_range_tumour, a random subset of size N is retained

type: integer

default: 20000

If TRUE, it runs the analysis with reduced sampling power and accuracy. Use this to obtain a result for preliminary inspection of your data, and then run autofit with this parameter set to FALSE.

type: string

default: TRUE

Email address for completion summary.

type: boolean

Email address for completion summary, only when pipeline fails.

type: boolean

Do not use coloured log outputs.

type: boolean

Send plain-text email instead of HTML.

type: boolean

Git commit id for Institutional configs.

type: string

default: master

Base directory for Institutional configs.

type: string

default: https://raw.githubusercontent.com/nf-core/configs/master

Institutional config description.

type: boolean

Institutional config contact information.

type: boolean

type: string

default: s3://ngi-igenomes/igenomes/

Institutional config URL link.

type: boolean

type: string

default: null/pipeline_info

type: boolean

type: string

default: https://raw.githubusercontent.com/nf-core/test-datasets/

type: string

type: boolean

default: true

type: string

nf-core/tumourevo