nf-core/tumourevo
Analysis pipleine to model tumour clonal evolution from WGS data (driver annotation, quality control of copy number calls, subclonal and mutational signature deconvolution)
22.10.6
.
Learn more.
Define where the pipeline should find input data and save output data.
Path to comma-separated file containing information about the samples in the experiment.
string
^\S+\.csv$
You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row.
The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
string
Most common options used for the pipeline.
Path to reference fasta file.
string
Reference genome name.
string
List of tools for running the pipeline.
string
mobster,viber,pyclone-vi,sparsesignatures,sigprofiler
Flag for filtering or not QC mutations.
boolean
true
Method used to save pipeline results to output directory.
string
copy
Variant Annotation parameters.
Parameter for downloading VEP cache.
string
Path to VEP cache.
string
VEP cache version.
string
VEP species.
string
VEP reference genome name.
string
Add an extra custom argument to VEP.
string
--everything --filter_common --per_gene --total_length --offline --format vcf
Driver Annotation parameters.
Path to driver table.
string
https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/tumourevo/data/DRIVER_ANNOTATION/ANNOTATE_DRIVER/Compendium_Cancer_Genes.tsv
Filtering parameters from vcf file.
Flag for filtering mutations from vcf.
string
TRUE
CNAqc tool parameters.
For clonal simple CNAs, the list of segments to test.
string
c(\'1:0\', \'1:1\', \'2:0\', \'2:1\', \'2:2\')
By default LOH regions (A, AA), diploid regions (AB), and amplification regions (AAB, AABB) are tested, corresponding to '1:0', '1:1', '2:1', '2:0', '2:2' in "Major
" notation.For clonal simple CNAs, a filter for the segments to test.
integer
The segment size is defined based on the number of mutations mapped, this cut is on the proportion relative to the whole set of segments one wishes to analyse (defined by karyotypes
). For example, by setting min_karyotype_size = 0.2
one would QC clonal simple CNAs that contain at least 20 The default of this parameter is 0
(all QCed).
For clonal simple CNAs, as min_karyotype_size but with a cut measured on absolute mutation counts.
integer
100
For example, by setting min_absolute_karyotype_mutations = 150
one would QC clonal simple CNAs that contain at least 150
mutations. The default of this parameter is 100
.
For clonal simple CNAs, peaks detected will be filtered if, in a peak, we map less than p_binsize_peaks * N mutations.
number
0.005
The value N is obtained couting all mutations that map in all peaks. By default this parameters is 0.005
.
Deprecated parameter.
string
NULL
For clonal simple CNAs, the purity error tolerance to determine QC pass or fail.
number
0.05
This can be set automatically using function auto_tolerance to optimise the analysis based on a desired rate of false positives matches, as a function of the data coverage and (putative) purity.
For clonal simple CNAs, a tolerance in comparing bands overlaps which is applied to the raw VAF values.
number
0.015
For clonal simple CNAs, the number of times peak detection is bootstrapped (by default 1).
integer
1
This helps sometimes finding peaks that might be visually observable but fail to be detected by the underlying peak-detection heuristics.
For KDE-based matches the adjust density parameter; see density.
integer
1
For clonal simple CNAs, if "closest" the closest peak will be used to match the expected peak. If "rightmost" peaks are matched prioritizing right to left peaks (the higher-VAF gets matched first); this strategy is more correct in principle but works only if there are no spurious peaks in the estimated density.
string
rightmost
Deprecated parameter.
string
TRUE
For subclonal simple CNAs, the starting state to determine linear versus branching evolutionary models.
string
1:1
For subclonal simple CNAs, the starting state to determine linear versus branching evolutionary models.
string
FALSE
Minimum number of mutations that are required to be mapped to a karyotype in order to compute CCF values (default 25).
integer
25
For the entropy-based method, percentage of mutations that can be not-assigned (NA) in a karyotype.
number
0.1
If the karyotype has more than cutoff_QC_PASS percentage of non-assigned mutations, then the overall set of CCFs is failed for the karyotype.
Either "ENTROPY" (default) or "ROUGH", to reflect the two different algorithms to compute CCF.
string
ENTROPY
string
absolute
joinCNAqc
If TRUE the mutations flagged as FAILED by CNAqc are discarded while building the joinCNAqc segmentation, if FALSE they are kept in the new object.
string
FALSE
If TRUE the original CNAqc object is kept in the joinCNAqc object, otherwise it is lost.
string
TRUE
The probability density used to model the read count data. Choices are beta-binomial and binomial.
string
beta-binomial
binomial is a common choice for sequencing data. beta-binomial is useful when the data is over-dispersed which has been observed frequently in sequencing data.
Number of random restarts of variational inference.
integer
100
More restarts will have a higher probability of finding the optimal variational approximation. This also increases running time.
Number of grid points used for approximating the posterior distribution.
integer
100
Higher values should be used for deeply sequenced data.
The number of clusters to use while fitting.
integer
20
This should be set to a value larger than the expected number of clusters. The software will then automatically determine how many to use. In general this value should increase if as more samples are used.
A vector with the number of Beta components to use. All values of K must be positive and strictly greater than 0.
string
1:5
Initial values for the parameters of the model. Can be either "random" or "peaks" .
string
peaks
Number of fits that should be attempted for each configuration of the model tested.
integer
5
Boolean value whether to use or not tail mutations for subclonal deconvolution.
string
c(TRUE,FALSE)
Tolerance for convergency estimation.
number
1e-10
Maximum number of steps for a fit.
integer
250
A string that determines the type of fit. Can be either "MLE" , for the Maximum Likelihood Estimate of the Beta parameters, or "MM"for the Moment Matching.
string
MM
Seed for the random numbers generator
integer
12345
Score to minimize to select the best model; this has to be one of 'reICL', 'ICL', 'BIC', 'AIC' or 'NLL'.
string
reICL
Boolean value whether to return the trace of model fit.
string
FALSE
Boolean value whether to run the fit in parallel.
string
TRUE
The minimum mixing proportion of a cluster to be returned as output.
number
0.02
The minimum number of mutations assigned to a cluster to be returned as output.
integer
10
string
FALSE
Overrides all the parameters with a predefined set of values, in order to implement different analyses.
string
NULL
The maximum number of clusters returned
integer
10
The number of fits to be computed.
integer
10
The concentration parameter of the Dirichlet mixture.
number
0.000001
The prior Beta hyperparameter for each Binomial component a
integer
1
The prior Beta hyperparameter for each Binomial component b
integer
1
The maximum number of fit iterations
integer
5000
The epsilon to measure convergence as ELBO absolute difference
number
1e-10
Initialization of the q-distribution to compute the approximation of the posterior distributions.
string
prior
This can be set in three different waysL equal to the prior (q_init = 'prior'), via kmeans clustering (q_init = 'kmeans') and capturing points which are private to each dimension (q_init = 'private').
Boolean value whether to return the trace of model fit.
string
FALSE
The minimum Binomial success probability when applying a heuristic procedure to filter clusters after Variational Inference.
number
0.05
The minimum size of the mixture component when applying a heuristic procedure to filter clusters after Variational Inference.
number
0.02
Boolean value whether point assigned to a cluster that is filtered our, are re-assigned from the density function.
string
FALSE
The minimum number of dimensions where we want to detect a Binomial component when applying a heuristic procedure to filter clusters after Variational Inference.
integer
1
If there are less than this number of tree available, all the structures are examined in an exhaustive fashion. Otherwise, if there are more than this, a Monte Carlo sampler is used.
integer
10000
f a Monte Carlo sampler is used, n.sampling distinct trees are sampled and scored.
integer
5000
When a number of trees are generated, scored and ranked, a maximum of store.max are returned to the user (these are selected following the ranking).
integer
100
The number of signatures (min. value = 2) to be fit to the dataset, including the background signature.
string
2:10
A numeric vector of length 96 provided by the user. The parameter is ignored if beta is given instead. If NULL, it is estimated through NMF.
string
NULL
The initial value of the signature matrix β. If NULL, it is estimated with a few runs of NMF. It must include the background signature as its first row.
string
NULL
If TRUE normalize the count matrix x row-wise before processing it. Useful for algorithm stability, when considerably different total counts of mutations are observed among the patients.
string
TRUE
The number of iterations of every single run of NMF LASSO.
integer
30
Number of iterations to estimate the length(K) matrices beta (including the background signature) in case the argument beta is NULL. Ignored if beta is given.
integer
10
The number of sub-iterations involved in the sparsification phase, within a full NMF LASSO iteration.
integer
10000
The number of requested NMF worker subprocesses to spawn. If Inf, an adaptive maximum number is automatically chosen. If NA or NULL, the function is run as a single process.
string
all
The cross-validation test size, i.e., the percentage of entries set to zero during NMF and used for validation.
number
0.01
The number of repetitions of the cross-validation procedure.
integer
50
The number of randomized restarts of a single cross-validation repetition, in case of poor fits.
integer
5
The candidate values of the sparsity parameter for the signature matrix 'beta' whose goodness of fit is assessed by cross-validation.
string
c(0.01, 0.05, 0.1, 0.2)
The candidate values of the sparsity parameter for the exposure-matrix entries alpha whose goodness of fit is assessed by cross-validation.
integer
If TRUE, informative messages are printed on the R console over the execution.
string
TRUE
Seed for the random number generation. To be set for reproducibility.
integer
12345
The candidate values of the sparsity parameter for the exposure-matrix entries alpha whose goodness of fit is assessed by cross-validation.
string
c(0.00, 0.01, 0.05, 0.10)
Mode of publishing the SigProfiler genome.
string
move
Specify True if the reference genome should be downloaded.
boolean
true
Specify the path to the reference genome (if downloaded by the user), e.g. path/to/genome/tsb
string
Downsamples mutational matrices to the exome regions of the genome.
string
'matrix' is used for table format inputs using a tab separated file.
string
matrix
The minimum number of signatures to be extracted.
integer
1
The maximum number of signatures to be extracted.
integer
25
Mutation context name(s), separated by commas (,), that define the mutational contexts for signature extraction. In the default value, 96 represents the SBS96 context, DINUC represents the dinucleotide context, and ID represents the indel context.
string
96,DINUC,ID
The number of iteration to be performed to extract each number signature.
integer
100
If True, add poisson noise to samples by resampling.
string
true
Method of normalizing the genome matrix before it is analyzed by NMF. Options are, 'log2', 'custom' or 'none'.
string
gmm
The initialization algorithm for W and H matrix of NMF. Options are 'random', 'nndsvd', 'nndsvda', 'nndsvdar' and 'nndsvd_min'.
string
random
Value defines the minimum number of iterations to be completed before NMF converges.
integer
10000
Value defines the maximum number of iterations to be completed before NMF converges .
integer
1000000
Value defines the number number of iterations to done between checking next convergence .
integer
10000
Ensures reproducible NMF replicate resamples. Provide the path to the Seeds.txt file (found in the results folder from a previous analysis) to reproduce results.
string
random
The cutoff thresh-hold of the average stability (default: 0.8). Solutions with average stabilities below this thresh-hold will not be considered.
number
0.8
The number of processors to be used to extract the signatures (default: all processors).
integer
-1
The cutoff thresh-hold of the minimum stability (default: 0.2). Solutions with minimum stabilities below this thresh-hold will not be considered.
number
0.2
The cutoff thresh-hold of the combined stability (sum of average and minimum stability) (default: 1.0). Solutions with combined stabilities below this thresh-hold will not be considered.
integer
1
Generate de novo to COSMIC signature decomposition plots as part of the results (default: True). Set to False to skip generating these plots.
string
true
If True, SBS288 and SBS1536 de novo signatures will be mapped to SBS96 reference signatures (default: True). If False, those will be mapped to reference signatures of the same context.
string
true
Defines the version of the COSMIC reference signatures (default: 3.4). Takes a positive float among 1, 2, 3, 3.1, 3.2, 3.3, and 3.4.
number
3.4
Write to output Ws and Hs from all the NMF iterations.
string
true
Create the probability matrix.
string
true
Outputs chromosome-based matrices.
string
Downsamples mutational matrices to custom regions of the genome. Requires the full path to the BED file.
string
None
Integrates with SigProfilerPlotting to output all available visualizations for each matrix.
string
Ouputs original mutations into a text file that contains the SigProfilerMatrixGenerator classificaiton for each mutation.
string
Values should be single or double.
string
single
Adds an Xbp cushion to the exome/bed_file ranges for downsampling the mutations.
integer
100
Defines if the GPU resource will used if available. If True, the GPU resources will be used in the computation. Note: All available CPU processors are used by default, which may cause a memory error. This error can be resolved by reducing the number of CPU processes through the cpu parameter.
string
Outputs the results of a transcriptional strand bias test for the respective matrices.
string
Will be effective only if the GPU is used. Defines the number of NMF replicates to be performed by each CPU during the parallel processing. Note: For batch_size values greater than 1, each NMF replicate will update until max_nmf_iterations is reached.
integer
1
Defines if solutions with a drop in stability with respect to the highest stable number of signatures will be considered.
string
Value defines the tolerance to achieve to converge.
number
1e-15
A number that represent the normal contamination level for which the sample is considered passed or failed.
integer
3
A range [x, y]
so that only mutations with VAF in that range are actually used to determine the TIN/ TIT levels of the input.
string
c(0, 0.7)
An upper bound on the VAF of a cluster in the tumour data. Clusters above this value will be considered miscalled clonal clusters (e.g., due to LOH etc.).
number
0.6
Consider only latent variables with responsibilities above this cutoff.
number
0.75
If there are more than N
mutations in VAF range VAF_range_tumour
, a random subset of size N
is retained
integer
20000
If TRUE
, it runs the analysis with reduced sampling power and accuracy. Use this to obtain a result for preliminary inspection of your data, and then run autofit
with this parameter set to FALSE
.
string
TRUE
Email address for completion summary.
boolean
Email address for completion summary, only when pipeline fails.
boolean
Do not use coloured log outputs.
boolean
Send plain-text email instead of HTML.
boolean
Git commit id for Institutional configs.
string
master
Base directory for Institutional configs.
string
https://raw.githubusercontent.com/nf-core/configs/master
Institutional config description.
boolean
Institutional config contact information.
boolean
string
s3://ngi-igenomes/igenomes/
Institutional config URL link.
boolean
boolean
string
null/pipeline_info
boolean
boolean
string
string
https://raw.githubusercontent.com/nf-core/test-datasets/
string
boolean
true
string