nf-core/metaboigniter
Pre-processing of mass spectrometry-based metabolomics data with quantification and identification based on MS1 and MS2 data.
1.0.1
). The latest
stable release is
2.0.1
.
General options that affect the whole pipeline
Output directory for results
string
./results
Email address for completion summary
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
An email address to send a summary email to when the pipeline is completed.
Parameters used to tune library creation (only set if you want to do library idnetification)
set whether you want to do quantification with OpenMS (openms) or XCMS (xcms) in negative ionization (for library)
string
Controls how to perform IPO
string
Possible values:
none
: don't perform IPOglobal
: performs IPO on all or selected number of sampleslocal
: performs IPO on individual samples one at the time
Quantification methods for IPO
string
centWave
Only centWave is supported at this stage.
lowest level of noise
number
1000
highest level of noise
number
lowest level of signal to noise threshold
number
10
highest level of signal to noise threshold
number
10
Function for centering the mz
string
wMean
Integration method
number
1
logical, if TRUE a Gaussian is fitted
boolean
lower minimum width of peaks
number
12
higher minimum width of peaks
number
28
lower maximum width of peaks
number
35
higher maximum width of peaks
number
65
lower ppm mass deviation
number
17
higher ppm mass deviation
number
32
lower minimum difference in m/z for peaks with overlapping retention times, can be negative to allow overlap
number
-0.001
higher minimum difference in m/z for peaks with overlapping retention times, can be negative to allow overlap
number
0.01
maximum charge of molecules (only used in individual setting)
number
1
ppm mass deviation for adducts (only used in individual setting)
number
10
lower value of K in 'prefilter_library_neg=c(k,I)'. Prefilter step for the first phase. Mass traces are only retained if they contain at least 'k' peaks with intensity >_library_neg= 'I'.
number
3
higher value of K in 'prefilter_library_neg=c(k,I)'. Prefilter step for the first phase. Mass traces are only retained if they contain at least 'k' peaks with intensity >_library_neg= 'I'.
number
3
lower I in prefilter
number
100
higher I in prefilter
number
100
number of cores used in IPO
number
5
lower Penalty for Gap opening
number
higher Penalty for Gap opening
number
0.4
lower Penalty for Gap enlargement
number
2.1
higher Penalty for Gap enlargement
number
2.7
lower step size (in m/z) to use for profile generation from the raw data files
number
0.7
higher step size (in m/z) to use for profile generation from the raw data files
number
1
lower Responsiveness of warping. 0 will give a linear warp based on start and end points. 100 will use all bijective anchors
number
1
higher Responsiveness of warping. 0 will give a linear warp based on start and end points. 100 will use all bijective anchors
number
1
lower Local weighting applied to diagonal moves in alignment.
number
2
higher Local weighting applied to diagonal moves in alignment.
number
2
lower Local weighting applied to gap moves in alignment.
number
1
higher Local weighting applied to gap moves in alignment.
number
1
Local rather than global alignment
number
lower bandwidth (consider something like retention time differences)
number
22
higher bandwidth (consider something like retention time differences)
number
38
lower minimum fraction of samples necessary in at least one of the sample groups for it to be a valid group
number
0.3
higher minimum fraction of samples necessary in at least one of the sample groups for it to be a valid group
number
0.7
lower mz width (mz differences)
number
0.015
higher mz width (mz differences)
number
0.035
lower minimum number of samples necessary in at least one of the sample groups for it to be a valid group
number
1
higher minimum number of samples necessary in at least one of the sample groups for it to be a valid group
number
1
lower maximum number of groups to identify in a single m/z slice
number
50
higher maximum number of groups to identify in a single m/z slice
number
50
DistFunc function: cor (Pearson's R) or cor_opt (default, calculate only 10% diagonal band of distance matrix, better runtime), cov (covariance), prd (product), euc (Euclidean distance)
string
cor_opt
Only obiwarp is supported
string
obiwarp
mass trace deviation in ppm
number
10
lower width of peaks
number
5
highest width of peaks
number
30
level of noise
number
1000
minimum difference in m/z for peaks with overlapping retention times, can be negative to allow overlap
number
-0.001
signal to noise ratio cutoff, definition see below.
number
10
K in 'prefilter=c(k,I)'. Prefilter step for the first phase. Mass traces are only retained if they contain at least 'k' peaks with intensity >= 'I'.
number
3
I in prefilter
number
100
Function to calculate the m/z center of the feature: 'wMean' intensity weighted mean of the feature m/z values, 'mean' mean of the feature m/z values, 'apex' use m/z value at peak apex, 'wMeanApex3' intensity weighted mean of the m/z value at peak apex and the m/z value left and right of it, 'meanApex3' mean of the m/z value at peak apex and the m/z value left and right of it.
string
wMean
Integration method. If '=1' peak limits are found through descent on the mexican hat filtered data, if '=2' the descent is done on the real data. Method 2 is very accurate but prone to noise, while method 1 is more robust to noise but less exact.
number
1
logical, if TRUE a Gaussian is fitted
boolean
A name for the class of sample
string
Sample
sigma value for grouping the peaks across chromatogram
number
8
full width at half maximum for finding overlaping peaks
number
0.6
which intensity value to use
string
maxo
ppm deviation between theoritical adduct mass and the experimental one
number
10
this has to be negative (for testing only)!
string
negative
number of changes to consider (most often 1 is enough)
number
1
ppm deviation when mapping MS2 parent ion to a mass trace
number
10
rt difference (in second) for mapping MS2 parent ion to a mass trace (the mass trace is a range, star and end of the trace)
number
5
Parameters used to characterize the internal library
name of the column showing which raw files contain which metabolite in the library_description_neg csv file
string
rawFile
name of the column showing id of the metabolites in the library_description_neg csv file
string
HMDB.YMDB.ID
name of the compount column in the library_description_neg csv file
string
PRIMARY_NAME
name of the mz column in the library_description_neg csv file
string
mz
"f" or "c", showing whether the Feature range or centroid of the feature should be used for mapping
string
Number of cores for mapping the features
number
1
ppm error for mapping the library characterization masses to the experimental one
number
10
Parameters that control how the results are outputted
relative difference of mass of the ID hit compare to a mass trace (ppm)
number
10
retention time difference of ID to mass trace (second)
number
5
should we impute adduct and different chance states with the same ID
boolean
true
Class of the samples (used for statistics and coverage calculations)
string
Class
what class of samples do you want to keep (anything not matching this in the Class column will be removed)
string
Sample
you want to rename the files
boolean
true
which column of the phenotype file to use for renaming
string
rename
do only want to see the identified mass traces or everything?
boolean
do you have technical replicates you want to average ?
boolean
which column of the phenotype file show the technical replicates
string
rep
should we log2 the output
boolean
true
any mass trace having more pecentage of the missing value will be removed
number
50
do you want to normalize the data set to 'NA' if you don't want normalization
string
1
Parameters specific to CFM-ID
path to a csv file containing your database
string
number of cores that cfm can use
number
2
name of the column in the database for id of the molecules
string
Identifier
name of the column in the database for smile of the molecules
string
SMILES
name of the column in the database for mass of the molecules
string
MonoisotopicMass
name of the column in the database for name of the molecules
string
Name
name of the column in the database for inchi of the molecules
string
InChI
Parameters only for MetFrag
path to a csv file containing your database
string
number of cores that metfrag can use
number
2
Parameters only for CSI:FINGERID
IMPORTANT: we don't support database file for csi:fingerid. You will need to provide what database to use here, the rest of the parameters will be taken from there parameter file
string
hmdb
number of cores that csi can use
number
2
number of seconds that each csi ion can rum (time limit)
number
600
Parameters that will be used in all the search engines
ppm deviation when mapping MS2 parent ion to a mass trace
number
10
rt difference (in second) for mapping MS2 parent ion to a mass trace (the mass trace is a range, star and end of the trace)
number
5
relative mass tolerance of the precursor (ppm)
number
10
relative mass tolerance of the fragment ions (ppm)
number
20
absolute mass tolerance of the fragment ions
number
0.05
type of database to use (see metaboIGNITER guide)
string
LocalCSV
ionization method. This has to be neg (only for testing at this stage)
string
neg
adduct rules (primary or extended)
string
ions with less that this number will be removed
number
2
Settings for CAMERA to detect adducts and isotopes
sigma value for grouping the peaks across chromatogram
number
8
full width at half maximum for finding overlaping peaks
number
0.6
which intensity value to use
string
maxo
ppm deviation between theoritical adduct mass and the experimental one
number
10
this has to be negative (for testing only)!
string
negative
number of changes to consider (most often 1 is enough)
number
1
Parameters to use for performing QC, blank and dilution filtering
set to true if you want to remove signal from blank
boolean
method of sumarization of signal in blank samples
string
max
Name of the class of the blank samples
string
Blank
Name of the class of the biological samples
string
Sample
set to T to compare blanks only to rest of the samples
string
Whether blank filtereing should be done or not?
boolean
This series will used for calculation of correlation. For example if this parameter is set like 1,2,3 and the class of dilution trends is set as D1,D2,D3 the following the pairs will be used for calculating the correlation: (D1,1),(D2,2),(D3,3)
string
0.5,1,2,4
The class of the samples represneting dilution. This has to be separated by comma!
string
D1,D2,D3,D4
p-value of the correlation. Anything higher than this will be removed!
number
0.05
minimum expected correlation. Aniything lower than this will be removed!
number
-1
If the tool should consider absolute correlation rather than the typical one from [-1 to 1] (F or T)
string
select to whether perfrom cv filtering or not
boolean
class of your QC samples
string
QC
Maximum coefficient of variation you expect. Anything higher than this will be removed!
number
0.3
Parameters for quantification
set whether you want to do quantification with OpenMS (openms) or XCMS (xcms) in negative ionization
string
controls how to perform IPO possible values: "none": don't perform IPO, "global": performs IPO on all or selected number of samples. "global_quant": perform IPO only for quantification (not retention time correction and grouping), "local": performs IPO on individual samples one at the time. "local_quant": performs IPO on individual samples only for quantification, "local_RT": performs IPO on only for retention time correction and grouping.
string
Performs IPO on all the samples irrespective of the class they have
boolean
If ipo_allSamples_neg is false, one must pass the phenotype file to select sample. This parameter select the column of the phenotype file.
string
Class
Selects the files only with this value in the columnToSelect column
string
QC
Quantification methods for IPO. Only centWave is supported at this stage.
string
centWave
lowest level of noise
number
highest level of noise
number
lowest level of signal to noise threshold
number
10
highest level of signal to noise threshold
number
10
Function for centering the mz
string
wMean
Integration method. If '=1' peak limits are found through descent on the mexican hat filtered data, if '=2' the descent is done on the real data. Method 2 is very accurate but prone to noise, while method 1 is more robust to noise but less exact.
number
1
logical, if TRUE a Gaussian is fitte
boolean
lower minimum width of peaks
number
12
higher minimum width of peaks
number
28
lower maximum width of peaks
number
35
higher maximum width of peaks
number
65
lower ppm mass deviation
number
17
higher ppm mass deviation
number
32
lower minimum difference in m/z for peaks with overlapping retention times, can be negative to allow overlap
number
-0.001
higher minimum difference in m/z for peaks with overlapping retention times, can be negative to allow overlap
number
0.01
maximum charge of molecules (only used in individual setting)
number
1
ppm mass deviation for adducts (only used in individual setting)
number
10
lower value of K in 'prefilter_neg=c(k,I)'. Prefilter step for the first phase. Mass traces are only retained if they contain at least 'k' peaks with intensity >_neg= 'I'.
number
3
higher value of K in 'prefilter_neg=c(k,I)'. Prefilter step for the first phase. Mass traces are only retained if they contain at least 'k' peaks with intensity >_neg= 'I'.
number
3
lower I in prefilter
number
100
higher I in prefilter
number
100
number of cores used in IPO
number
5
lower Penalty for Gap opening
number
higher Penalty for Gap opening
number
0.4
lower Penalty for Gap enlargement
number
2.1
higher Penalty for Gap enlargement
number
2.7
lower step size (in m/z) to use for profile generation from the raw data files
number
0.7
higher step size (in m/z) to use for profile generation from the raw data files
number
1
lower Responsiveness of warping. 0 will give a linear warp based on start and end points. 100 will use all bijective anchors
number
1
higher Responsiveness of warping. 0 will give a linear warp based on start and end points. 100 will use all bijective anchors
number
1
lower Local weighting applied to diagonal moves in alignment
number
2
higher Local weighting applied to diagonal moves in alignment
number
2
lower Local weighting applied to gap moves in alignment
number
1
higher Local weighting applied to gap moves in alignment
number
1
Local rather than global alignment
number
lower bandwidth (consider something like retention time differences)
number
22
higher bandwidth (consider something like retention time differences)
number
38
lower minimum fraction of samples necessary in at least one of the sample groups for it to be a valid group
number
0.3
higher minimum fraction of samples necessary in at least one of the sample groups for it to be a valid group
number
0.7
lower mz width (mz differences)
number
0.015
higher mz width (mz differences)
number
0.035
lower minimum number of samples necessary in at least one of the sample groups for it to be a valid group
number
1
higher minimum number of samples necessary in at least one of the sample groups for it to be a valid group
number
1
lower maximum number of groups to identify in a single m/z slice
number
50
higher maximum number of groups to identify in a single m/z slice
number
50
DistFunc function: cor (Pearson's R) or cor_opt (default, calculate only 10% diagonal band of distance matrix, better runtime), cov (covariance), prd (product), euc (Euclidean distance)
string
cor_opt
Only obiwarp is supported
string
obiwarp
masstrance deviation in ppm
number
10
lower width of peaks
number
5
highest width of peaks
number
30
level of noise
number
1000
minimum difference in m/z for peaks with overlapping retention times, can be negative to allow overlap
number
-0.001
signal to noise ratio cutoff, definition see below.
number
10
K in 'prefilter=c(k,I)'. Prefilter step for the first phase. Mass traces are only retained if they contain at least 'k' peaks with intensity >= 'I'.
number
3
I in prefilter
number
100
Function to calculate the m/z center of the feature: 'wMean' intensity weighted mean of the feature m/z values, 'mean' mean of the feature m/z values, 'apex' use m/z value at peak apex, 'wMeanApex3' intensity weighted mean of the m/z value at peak apex and the m/z value left and right of it, 'meanApex3' mean of the m/z value at peak apex and the m/z value left and right of it.
string
wMean
Integration method. If '=1' peak limits are found through descent on the mexican hat filtered data, if '=2' the descent is done on the real data. Method 2 is very accurate but prone to noise, while method 1 is more robust to noise but less exact.
number
1
logical, if TRUE a Gaussian is fitted
boolean
name of the column in the phenotype_design_neg showing class information of the samples
string
Class
A name for the class of sample
string
Sample
step size (in m/z) to use for profile generation from the raw data files
number
1
the index of the sample all others will be aligned to. If center==NULL, the sample with the most peaks is chosen as default.
string
NULL
Responsiveness of warping. 0 will give a linear warp based on start and end points. 100 will use all bijective anchors
number
1
DistFunc function: cor (Pearson's R) or cor_opt (default, calculate only 10% diagonal band of distance matrix, better runtime), cov (covariance), prd (product), euc (Euclidean distance)
string
Penalty for Gap opening
string
NULL
Penalty for Gap enlargement
string
NULL
Local weighting applied to diagonal moves in alignment
number
2
Local weighting applied to gap moves in alignment
number
1
Local rather than global alignment
number
bandwidth (consider something like retention time differences)
number
15
mz width (mz differences)
number
0.005
minimum fraction of samples necessary in at least one of the sample groups for it to be a valid group
number
0.5
minimum number of samples necessary in at least one of the sample groups for it to be a valid group
number
1
maximum number of groups to identify in a single m/z slice
number
50
Parameters used to tune library creation (only set if you want to do library idnetification)
set whether you want to do quantification with OpenMS (openms) or XCMS (xcms) in positive ionization (for library)
string
controls how to perform IPO possible values: "none": don't perform IPO, "global": performs IPO on all or selected number of samples. "local": performs IPO on individual samples one at the time.
string
Quantification methods for IPO. Only centWave is supported at this stage.
string
centWave
lowest level of noise
number
highest level of noise
number
lowest level of signal to noise threshold
number
10
highest level of signal to noise threshold
number
10
Function for centering the mz
string
wMean
Integration method. If '=1' peak limits are found through descent on the mexican hat filtered data, if '=2' the descent is done on the real data. Method 2 is very accurate but prone to noise, while method 1 is more robust to noise but less exact.
number
1
logical, if TRUE a Gaussian is fitte
boolean
lower minimum width of peaks
number
12
higher minimum width of peaks
number
28
lower maximum width of peaks
number
35
higher maximum width of peaks
number
65
lower ppm mass deviation
number
17
higher ppm mass deviation
number
32
lower minimum difference in m/z for peaks with overlapping retention times, can be negative to allow overlap
number
-0.001
higher minimum difference in m/z for peaks with overlapping retention times, can be negative to allow overlap
number
0.01
maximum charge of molecules (only used in individual setting)
number
1
ppm mass deviation for adducts (only used in individual setting)
number
10
lower value of K in 'prefilter_library_pos=c(k,I)'. Prefilter step for the first phase. Mass traces are only retained if they contain at least 'k' peaks with intensity >_library_pos= 'I'.
number
3
higher value of K in 'prefilter_library_pos=c(k,I)'. Prefilter step for the first phase. Mass traces are only retained if they contain at least 'k' peaks with intensity >_library_pos= 'I'.
number
3
lower I in prefilter
number
100
higher I in prefilter
number
100
number of cores used in IPO
number
5
lower Penalty for Gap opening
number
higher Penalty for Gap opening
number
0.4
lower Penalty for Gap enlargement
number
2.1
higher Penalty for Gap enlargement
number
2.7
lower step size (in m/z) to use for profile generation from the raw data files
number
0.7
higher step size (in m/z) to use for profile generation from the raw data files
number
1
lower Responsiveness of warping. 0 will give a linear warp based on start and end points. 100 will use all bijective anchors
number
1
higher Responsiveness of warping. 0 will give a linear warp based on start and end points. 100 will use all bijective anchors
number
1
lower Local weighting applied to diagonal moves in alignment.
number
2
higher Local weighting applied to diagonal moves in alignment.
number
2
lower Local weighting applied to gap moves in alignment.
number
1
higher Local weighting applied to gap moves in alignment.
number
1
Local rather than global alignment
number
lower bandwidth (consider something like retention time differences)
number
22
higher bandwidth (consider something like retention time differences)
number
38
lower minimum fraction of samples necessary in at least one of the sample groups for it to be a valid group
number
0.3
higher minimum fraction of samples necessary in at least one of the sample groups for it to be a valid group
number
0.7
lower mz width (mz differences)
number
0.015
higher mz width (mz differences)
number
0.035
lower minimum number of samples necessary in at least one of the sample groups for it to be a valid group
number
1
higher minimum number of samples necessary in at least one of the sample groups for it to be a valid group
number
1
lower maximum number of groups to identify in a single m/z slice
number
50
higher maximum number of groups to identify in a single m/z slice
number
50
DistFunc function: cor (Pearson's R) or cor_opt (default, calculate only 10% diagonal band of distance matrix, better runtime), cov (covariance), prd (product), euc (Euclidean distance)
string
Only obiwarp is supported
string
obiwarp
mass trace deviation in ppm
number
10
lower width of peaks
number
5
highest width of peaks
number
30
level of noise
number
1000
minimum difference in m/z for peaks with overlapping retention times, can be negative to allow overlap
number
-0.001
signal to noise ratio cutoff, definition see below.
number
10
K in 'prefilter=c(k,I)'. Prefilter step for the first phase. Mass traces are only retained if they contain at least 'k' peaks with intensity >= 'I'.
number
3
I in prefilter
number
100
Function to calculate the m/z center of the feature: 'wMean' intensity weighted mean of the feature m/z values, 'mean' mean of the feature m/z values, 'apex' use m/z value at peak apex, 'wMeanApex3' intensity weighted mean of the m/z value at peak apex and the m/z value left and right of it, 'meanApex3' mean of the m/z value at peak apex and the m/z value left and right of it.
string
wMean
Integration method. If '=1' peak limits are found through descent on the mexican hat filtered data, if '=2' the descent is done on the real data. Method 2 is very accurate but prone to noise, while method 1 is more robust to noise but less exact.
number
1
logical, if TRUE a Gaussian is fitted
boolean
A name for the class of sample
string
Sample
sigma value for grouping the peaks across chromatogram
number
8
full width at half maximum for finding overlaping peaks
number
0.6
which intensity value to use
string
maxo
ppm deviation between theoritical adduct mass and the experimental one
number
10
this has to be positive (for testing only)!
string
positive
number of changes to consider (most often 1 is enough)
number
1
ppm deviation when mapping MS2 parent ion to a mass trace
number
10
rt difference (in second) for mapping MS2 parent ion to a mass trace (the mass trace is a range, star and end of the trace)
number
5
Parameters used to characterize the internal library
name of the column showing which raw files contain which metabolite in the library_description_pos csv file
string
rawFile
name of the column showing id of the metabolites in the library_description_pos csv file
string
HMDB.YMDB.ID
name of the compount column in the library_description_pos csv file
string
PRIMARY_NAME
name of the mz column in the library_description_pos csv file
string
mz
"f" or "c", showing whether the Feature range or centroid of the feature should be used for mapping
string
Number of cores for mapping the features
number
1
ppm error for mapping the library characterization masses to the experimental one
number
10
Parameters that control how the results are outputted
relative difference of mass of the ID hit compare to a mass trace (ppm)
number
10
retention time difference of ID to mass trace (second)
number
5
should we impute adduct and different chance states with the same ID
boolean
true
Class of the samples (used for statistics and coverage calculations)
string
Class
what class of samples do you want to keep (anything not matching this in the Class column will be removed)
string
Sample
you want to rename the files
boolean
true
which column of the phenotype file to use for renaming
string
rename
do only want to see the identified mass traces or everything?
boolean
do you have technical replicates you want to average ?
boolean
which column of the phenotype file show the technical replicates
string
rep
should we log2 the output
boolean
true
any mass trace having more pecentage of the missing value will be removed
number
50
do you want to normalize the data set to 'NA' if you don't want normalization
string
1
Parameters specific to CFM-ID
path to a csv file containing your database
string
number of cores that cfm can use
number
2
name of the column in the database for id of the molecules
string
Identifier
name of the column in the database for smile of the molecules
string
SMILES
name of the column in the database for mass of the molecules
string
MonoisotopicMass
name of the column in the database for name of the molecules
string
Name
name of the column in the database for inchi of the molecules
string
InChI
Parameters only for MetFrag
path to a csv file containing your database
string
number of cores that metfrag can use
number
2
Parameters only for CSI:FINGERID
IMPORTANT: we don't support database file for csi:fingerid. You will need to provide what database to use here, the rest of the parameters will be taken from there parameter file
string
hmdb
number of cores that csi can use
number
2
number of seconds that each csi ion can rum (time limit)
number
600
Parameters that will be used in all the search engines
ppm deviation when mapping MS2 parent ion to a mass trace
number
10
rt difference (in second) for mapping MS2 parent ion to a mass trace (the mass trace is a range, star and end of the trace)
number
5
relative mass tolerance of the precursor (ppm)
number
10
relative mass tolerance of the fragment ions (ppm)
number
20
absolute mass tolerance of the fragment ions
number
0.05
type of database to use (see metaboIGNITER guide)
string
LocalCSV
ionization method. This has to be pos (only for testing at this stage)
string
pos
adduct rules (primary or extended)
string
ions with less that this number will be removed
number
2
Settings for CAMERA to detect adducts and isotopes
sigma value for grouping the peaks across chromatogram
number
8
full width at half maximum for finding overlaping peaks
number
0.6
which intensity value to use
string
maxo
ppm deviation between theoritical adduct mass and the experimental one
number
10
this has to be positive (for testing only)!
string
positive
number of changes to consider (most often 1 is enough)
number
1
Parameters to use for performing QC, blank and dilution filtering
set to true if you want to remove signal from blank
boolean
method of sumarization of signal in blank samples. Must be one of 'max', 'mean' or 'median'. For example, if 'max' is selected, a signal will be removed if it maximum abundance in the blank samples is higher than maximum abundance in biological samples.
string
max
Name of the class of the blank samples. This must show the class of blank samples exactly as you refer to them in your phenotype file
string
Blank
Name of the class of the biological samples
string
Sample
set to T to compare the blanks only to rest of the samples. If F, the blank signals will be compared with the samples with class sample_blankfilter_pos_xcms
string
T
Select whether you want to do dilution filtering
boolean
This series will used for calculation of correlation. For example if this parameter is set like 1,2,3 and the class of dilution trends is set as D1,D2,D3 the following the pairs will be used for calculating the correlation: (D1,1),(D2,2),(D3,3)
string
0.5,1,2,4
The class of the samples represneting dilution. This has to be separated by comma! The samples are correlated to the exact order of the sequence provided here
string
D1,D2,D3,D4
p-value of the correlation. Anything higher than this will be removed!
number
0.05
minimum expected correlation. Aniything lower than this will be removed!
number
-1
If the tool should consider absolute correlation rather than the typical one from [-1 to 1] (F or T)
string
select to whether perfrom cv filtering or not
boolean
class of your QC samples
string
QC
Maximum coefficient of variation you expect. Anything higher than this will be removed!
number
0.3
Parameters for quantification
set whether you want to do quantification with OpenMS (openms) or XCMS (xcms) in positive ionization
string
controls how to perform IPO possible values: "none": don't perform IPO, "global": performs IPO on all or selected number of samples. "global_quant": perform IPO only for quantification (not retention time correction and grouping), "local": performs IPO on individual samples one at the time. "local_quant": performs IPO on individual samples only for quantification, "local_RT": performs IPO on only for retention time correction and grouping.
string
Performs IPO on all the samples irrespective of the class they have
boolean
If ipo_allSamples_pos is false, one must pass the phenotype file to select sample. This parameter select the column of the phenotype file.
string
Class
Selects the files only with this value in the columnToSelect column
string
QC
Quantification methods for IPO. Only centWave is supported at this stage.
string
centWave
lowest level of noise
number
highest level of noise
number
lowest level of signal to noise threshold
number
10
highest level of signal to noise threshold
number
10
Function for centering the mz
string
wMean
Integration method. If '=1' peak limits are found through descent on the mexican hat filtered data, if '=2' the descent is done on the real data. Method 2 is very accurate but prone to noise, while method 1 is more robust to noise but less exact.
number
1
logical, if TRUE a Gaussian is fitte
boolean
lower minimum width of peaks
number
12
higher minimum width of peaks
number
28
lower maximum width of peaks
number
35
higher maximum width of peaks
number
65
lower ppm mass deviation
number
17
higher ppm mass deviation
number
32
lower minimum difference in m/z for peaks with overlapping retention times, can be negative to allow overlap
number
-0.001
higher minimum difference in m/z for peaks with overlapping retention times, can be negative to allow overlap
number
0.01
maximum charge of molecules (only used in individual setting)
number
1
ppm mass deviation for adducts (only used in individual setting)
number
10
lower value of K in 'prefilter_pos=c(k,I)'. Prefilter step for the first phase. Mass traces are only retained if they contain at least 'k' peaks with intensity >_pos= 'I'.
number
3
higher value of K in 'prefilter_pos=c(k,I)'. Prefilter step for the first phase. Mass traces are only retained if they contain at least 'k' peaks with intensity >_pos= 'I'.
number
3
lower I in prefilter
number
100
higher I in prefilter
number
100
number of cores used in IPO
number
5
lower Penalty for Gap opening
number
higher Penalty for Gap opening
number
0.4
lower Penalty for Gap enlargement
number
2.1
higher Penalty for Gap enlargement
number
2.7
lower step size (in m/z) to use for profile generation from the raw data files
number
0.7
higher step size (in m/z) to use for profile generation from the raw data files
number
1
lower Responsiveness of warping. 0 will give a linear warp based on start and end points. 100 will use all bijective anchors
number
1
higher Responsiveness of warping. 0 will give a linear warp based on start and end points. 100 will use all bijective anchors
number
1
lower Local weighting applied to diagonal moves in alignment
number
2
higher Local weighting applied to diagonal moves in alignment
number
2
lower Local weighting applied to gap moves in alignment
number
1
higher Local weighting applied to gap moves in alignment
number
1
Local rather than global alignment
number
lower bandwidth (consider something like retention time differences)
number
22
higher bandwidth (consider something like retention time differences)
number
38
lower minimum fraction of samples necessary in at least one of the sample groups for it to be a valid group
number
0.3
higher minimum fraction of samples necessary in at least one of the sample groups for it to be a valid group
number
0.7
lower mz width (mz differences)
number
0.015
higher mz width (mz differences)
number
0.035
lower minimum number of samples necessary in at least one of the sample groups for it to be a valid group
number
1
higher minimum number of samples necessary in at least one of the sample groups for it to be a valid group
number
1
lower maximum number of groups to identify in a single m/z slice
number
50
higher maximum number of groups to identify in a single m/z slice
number
50
DistFunc function: cor (Pearson's R) or cor_opt (default, calculate only 10% diagonal band of distance matrix, better runtime), cov (covariance), prd (product), euc (Euclidean distance)
string
cor_opt
Only obiwarp is supported
string
obiwarp
masstrance deviation in ppm
number
10
lower width of peaks
number
5
highest width of peaks
number
30
level of noise
number
1000
minimum difference in m/z for peaks with overlapping retention times, can be negative to allow overlap
number
-0.001
signal to noise ratio cutoff, definition see below.
number
10
K in 'prefilter=c(k,I)'. Prefilter step for the first phase. Mass traces are only retained if they contain at least 'k' peaks with intensity >= 'I'.
number
3
I in prefilter
number
100
Function to calculate the m/z center of the feature: 'wMean' intensity weighted mean of the feature m/z values, 'mean' mean of the feature m/z values, 'apex' use m/z value at peak apex, 'wMeanApex3' intensity weighted mean of the m/z value at peak apex and the m/z value left and right of it, 'meanApex3' mean of the m/z value at peak apex and the m/z value left and right of it.
string
wMean
Integration method. If '=1' peak limits are found through descent on the mexican hat filtered data, if '=2' the descent is done on the real data. Method 2 is very accurate but prone to noise, while method 1 is more robust to noise but less exact.
number
1
logical, if TRUE a Gaussian is fitted
boolean
name of the column in the phenotype_design_pos showing class information of the samples
string
Class
A name for the class of sample
string
Sample
step size (in m/z) to use for profile generation from the raw data files
number
1
the index of the sample all others will be aligned to. If center==NULL, the sample with the most peaks is chosen as default.
string
NULL
Responsiveness of warping. 0 will give a linear warp based on start and end points. 100 will use all bijective anchors
number
1
DistFunc function: cor (Pearson's R) or cor_opt (default, calculate only 10% diagonal band of distance matrix, better runtime), cov (covariance), prd (product), euc (Euclidean distance)
string
Penalty for Gap opening
string
NULL
Penalty for Gap enlargement
string
NULL
Local weighting applied to diagonal moves in alignment
number
2
Local weighting applied to gap moves in alignment
number
1
Local rather than global alignment
number
bandwidth (consider something like retention time differences)
number
15
mz width (mz differences)
number
0.005
minimum fraction of samples necessary in at least one of the sample groups for it to be a valid group
number
0.5
minimum number of samples necessary in at least one of the sample groups for it to be a valid group
number
1
maximum number of groups to identify in a single m/z slice
number
50
Input files includes mzML files and settings for performing identification using internal library
if you have already charaztrized your negative library set this to true and specify the path for library_charactrization_file_neg
boolean
path to the file from charaztrized library (negative)
string
Path to a folder containing library mzML files used for doing adduct calcculation (MS1 data in megative ionization method)
string
Path to a folder containing mzML files used for doing identification (MS2 data in negative ionization method)
string
Path to a csv file containing description of the library for negative (see the help)
string
Input files includes mzML files and settings for performing identification using internal library
if you have already charaztrized your positive library set this to true and specify the path for library_charactrization_file_pos
boolean
path to the file from charaztrized library (positive)
string
Path to a folder containing library mzML files used for doing adduct calcculation (MS1 data in positive ionization method)
string
Path to a folder containing mzML files used for doing identification (MS2 data in positive ionization method)
string
Path to a csv file containing description of the library for positive (see the help)
string
Input files includes mzML files for performing identification
ath to a folder containing mzML files used for doing identification (MS2 data in positive ionization method)
string
Path to a folder containing mzML files used for doing identification (MS2 data in negative ionization method)
string
Used to control functionality of the workflow e.g identification, quantification etc
set to true to publish all the middle stages
boolean
Set to false if you don't want to do identification. You will not require to set MS2 related parameters if you set this to false
boolean
Should Metfrag be used for doing identification?
boolean
Should CSI:FingerID be used for doing identification?
boolean
Should CFM-ID be used for doing identification?
boolean
Should an internal library be used for doing identification?
boolean
You can either set to 'pos' (only positive), 'neg' (only negative), 'both' (both positive and negative).
string
Set to true if your data is in profile mode (only for quantification!)
boolean
Used for peak picking and feature detection
Path to the ini file for PeakPickerHiRes
string
$baseDir/assets/openms/openms_peak_picker_ini_pos.ini
Path to the ini file for PeakPickerHiRes
string
$baseDir/assets/openms/openms_peak_picker_ini_neg.ini
Path to the ini file for OpenMS FeatureFinderMetabo in positive mode
string
$baseDir/assets/openms/openms_feature_finder_metabo_ini_pos.ini
Path to the ini file for OpenMS FeatureFinderMetabo in negative mode
string
$baseDir/assets/openms/openms_feature_finder_metabo_ini_neg.ini
Path to the ini file for PeakPickerHiRes (for library)
string
$baseDir/assets/openms/openms_peak_picker_lib_ini_pos.ini
Path to the ini file for PeakPickerHiRes (for library)
string
$baseDir/assets/openms/openms_peak_picker_lib_ini_neg.ini
Path to the ini file for OpenMS FeatureFinderMetabo in positive mode (for library)
string
$baseDir/assets/openms/openms_feature_finder_metabo_lib_ini_pos.ini
Path to the ini file for OpenMS FeatureFinderMetabo in negative mode (for library)
string
$baseDir/assets/openms/openms_feature_finder_metabo_lib_ini_neg.ini
Input files includes mzML files for performing quantification
Path to a folder containing mzML files used for doing quantification (MS1 data in positive ionization method)
string
Path to a folder containing mzML files used for doing quantification (MS1 data in negative ionization method)
string
Path to a csv file containing the experimental design (MS1 data in positive ionization method)
string
Path to a csv file containing the experimental design (MS1 data in negative ionization method)
string
Less common options for the pipeline, typically set in a config file.
Display help text.
boolean
Method used to save pipeline results to output directory.
string
The Nextflow publishDir
option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.
Boolean whether to validate parameters against the schema at runtime
boolean
true
Email address for completion summary, only when pipeline fails.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
This works exactly as with --email
, except emails are only sent if the workflow is not successful.
Send plain-text email instead of HTML.
boolean
Set to receive plain-text e-mails instead of HTML formatted.
Do not use coloured log outputs.
boolean
Set to disable colourful command line output and live life in monochrome.
Directory to keep pipeline Nextflow logs and reports.
string
${params.outdir}/pipeline_info
Show all params when using --help
boolean
Set the top limit for requested resources for any single job.
Maximum number of CPUs that can be requested for any single job.
number
16
Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. --max_cpus 1
Maximum amount of memory that can be requested for any single job.
string
128.GB
^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$
Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. --max_memory '8.GB'
Maximum amount of time that can be requested for any single job.
string
240.h
^(\d+\.?\s*(s|m|h|day)\s*)+$
Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. --max_time '2.h'
Parameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
string
master
Provide git commit id for custom Institutional configs hosted at nf-core/configs
. This was implemented for reproducibility purposes. Default: master
.
Download and use config file with following git commit id
--custom_config_version d52db660777c4bf36546ddb188ec530c3ada1b96
Base directory for Institutional configs.
string
https://raw.githubusercontent.com/nf-core/configs/master
If you're running offline, nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell nextflow where to find them with the custom_config_base
option. For example:
Download and unzip the config files
cd /path/to/my/configs
wget https://github.com/nf-core/configs/archive/master.zip
unzip master.zip
Run the pipeline
cd /path/to/my/data
nextflow run /path/to/pipeline/ --custom_config_base /path/to/my/configs/configs-master/
Note that the nf-core/tools helper package has a
download
command to download all required pipeline files + singularity containers + institutional configs in one go for you, to make this process easier.
Institutional configs hostname.
string
Institutional config name.
string
Institutional config description.
string
Institutional config contact information.
string
Institutional config URL link.
string