nf-core/proteinfamilies
Generation and update of protein families
Define where the pipeline should find input data and save output data.
Path to comma-separated file ‘.csv’ containing information about the samples in the experiment.
string
^\S+\.csv$
The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
string
Email address for completion summary. Example: name.surname@example.com
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
MultiQC report title. Printed as page header, used for filename if not otherwise specified.
string
Parameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
string
master
Base directory for Institutional configs.
string
https://raw.githubusercontent.com/nf-core/configs/master
Institutional config name.
string
Institutional config description.
string
Institutional config contact information.
string
Institutional config URL link.
string
Less common options for the pipeline, typically set in a config file.
Display version and exit.
boolean
Method used to save pipeline results to output directory.
string
Email address for completion summary, only when pipeline fails. Example: name.surname@example.com
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
Send plain-text email instead of HTML.
boolean
File size limit when attaching MultiQC reports to summary emails. Example: name.surname@example.com
string
25.MB
^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$
Do not use coloured log outputs.
boolean
Incoming hook URL for messaging service
string
Custom config file to supply to MultiQC.
string
Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file
string
Custom MultiQC yaml file containing HTML including a methods description.
string
Boolean whether to validate parameters against the schema at runtime
boolean
true
Base URL or local path to location of pipeline test dataset files
string
https://raw.githubusercontent.com/nf-core/test-datasets/proteinfamilies/
Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.
string
Use these parameters to control the flow of the clustering subworkflow execution.
Save the db output folder of mmseqs createdb
boolean
Choose clustering algorithm. Either simple ‘cluster’ for medium size inputs, or ‘linclust’ for less sensitive clustering of larger datasets.
string
mmseqs parameter for minimum sequence identity
number
0.5
mmseqs parameter for minimum sequence coverage ratio
number
0.9
mmseqs parameter for coverage mode: 0 for both, 1 for target and 2 for query sequence
integer
Save the clustering output folder of mmseqs cluster or linclust
boolean
Minimum clustering chunk size threshold to create seed Multiple Sequence Alignments upon.
integer
25
Save membership-filtered initial mmseqs clusters in fasta format
boolean
Use these parameters to control the Multiple Sequence Alignment subworkflow execution.
Choose alignment tool. FAMSA is recommended as best time-memory-accuracy combination option.
string
Boolean whether to trim the Multiple Sequence Alignment (MSA) gaps
boolean
true
Choose the output format of the clipped alignment.
string
clipkit
Choose if ClipKIT should only clip gaps at the ends of the MSAs.
boolean
true
Multiple Sequence Alignment (MSA) positions with gappiness greater than this threshold will be trimmed
number
0.5
Set to true to recruit additional sequences from the input FASTA file using the family Hidden Markov Models (HMMs) to refine the alignments
boolean
true
Boolean whether to generate target results file of hmmsearch
boolean
Boolean whether to generate domain results file of hmmsearch
boolean
true
hmmsearch e-value cutoff threshold for reported results
number
0.001
Save the output of hmmsearch (.domtbl.gz and .tbl.gz)
boolean
hmmsearch minimum length percentage filter of hit env vs query length
number
0.9
Save family fasta files after recruiting sequences with hmmsearch
boolean
Use these parameters to control the redundancy removal subworkflow execution.
Removal of between-family redundancy via hmmsearch.
boolean
true
hmmsearch minimum length percentage filter of hit env vs query length, for redundant family removal
number
0.9
Save only the fasta files of non-redundant families (might still contain redundant sequences)
boolean
Removal of inside-family redundancy of sequences via mmseqs clustering.
boolean
true
mmseqs parameter for minimum sequence identity
number
0.9
mmseqs parameter for minimum sequence coverage ratio
number
0.9
mmseqs parameter for coverage mode: 0 for both, 1 for target and 2 for query sequence
integer
Save the final family fasta files with sequence redundancy removed
boolean