nf-core/phageannotator
Pipeline for identifying, annotation, and quantifying phage sequences in (meta)-genomic sequences.
Define where the pipeline should find input data and save output data.
Path to comma-separated file containing information about the samples in the experiment.
string
^\S+\.csv$
You will need to create a design file with information about the samples in your experiment before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row. See usage docs.
The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
string
Email address for completion summary.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (~/.nextflow/config
) then you don't need to specify this on the command line for every run.
MultiQC report title. Printed as page header, used for filename if not otherwise specified.
string
Filter assemblies at the beginning of the workflow
Minimum assembly length
integer
1000
Run ViromeQC to estimate viral enrichment
boolean
Identify reference viruses contained in reads
Run MASH screen to identify external viruses contained in reads
boolean
Path to FASTA file containing reference virus sequences
string
Path to mash sketch file for reference virus sequences
string
Save reference virus sketch, if it was created.
boolean
Minimum mash screen score to consider a genome contained
number
0.95
Hashes present in multiple references are assigned only to top sequence
boolean
Classify viral sequences using geNomad
Skip running geNomad to classify viral/non-viral sequences
boolean
Path to directory containing geNomad's database
string
Save geNomad's database, if it was downloaded.
boolean
Minimum virus score for a sequence to be considered viral
number
0.7
Maximum FDR for a sequence to be considered viral (will include --enable-score-calibration)
number
0.1
Number of splits for running geNomad (more splits lowers memory requirements)
integer
5
Extend viral contigs
Run COBRA to extend viral contigs
boolean
The assembler that was used to assemble viral contigs
string
Minimum kmer value used during assembly
string
Maximum kmer value used during assembly
string
Assess virus quality and filter
Skip running CheckV to assess virus quality and filter sequences
boolean
Path to directory containing CheckV database
string
Save CheckV's database, if it was downloaded
boolean
Minimum virus length to pass filtering
integer
3000
Minimum CheckV completeness to pass filtering
integer
50
Remove viruses labeled as provirus by geNomad or CheckV
boolean
Remove viruses with CheckV warnings
boolean
Cluster virus genomes based on nucleotide/protein similarity
Skip ANI-based virus clustering
boolean
Minimum precent identity for BLAST hits
integer
90
Maximum number of BLAST hits to record for each sequence
integer
25000
Minimum average nucleotide identity (ANI) for sequences to be clustered together
integer
95
Minimum query coverage for sequences to be clustered together
integer
Minimum test coverage for sequences to be clustered together
integer
85
Align reads to virus database
Skip read alignment to viral sequences
boolean
Minimum length of reads aligned to references
integer
Minimum percent identity of aligned reads
integer
Minimum percent of read aligned to references
integer
Abundance calculation metrics
string
mean
Assign taxonomy to virus sequences
boolean
Predict host genus for phage sequences
Run iPHoP to predict phage hosts
boolean
Path to locally iPHoP database
string
Save downloaded iPHoP database
boolean
Minimum confidence score to provide host prediction
integer
90
Predict the lifestyle of viral sequences
Run BACPHLIP to predict virus lifestyle
boolean
Functionally annotate viral genomes using a variety of approaches
Run pharokka to predict and annotate phage ORFs
boolean
Path to predownloaded pharokka db
string
Analyze virus diversity at the strain level
Bypass microdiversity analysis with inStrain
boolean
Minimum identity for read alignment to be considered
number
Minimum MAPQ for a read to be considered
integer
Minimum coverage for a variant to be considered
integer
Minimum allele frequency for an SNP to be considered
number
Maximum FDR for a SNP to be considered
integer
Minimum number of reads mapping to a genome to consider profiling
number
Minimum identity for genomes to be considered in the same strain
number
Minimum percent of genomes compared for comparison to be considered
number
Minimum breadth of coverage for a genome to be considered present
number
Arguments for running pipeline tests with custom arguments/databases.
boolean
number
boolean
Download test database rather than full database?
boolean
boolean
Parameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
string
master
Base directory for Institutional configs.
string
https://raw.githubusercontent.com/nf-core/configs/master
If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.
Institutional config name.
string
Institutional config description.
string
Institutional config contact information.
string
Institutional config URL link.
string
Set the top limit for requested resources for any single job.
Maximum number of CPUs that can be requested for any single job.
integer
16
Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. --max_cpus 1
Maximum amount of memory that can be requested for any single job.
string
128.GB
^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$
Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. --max_memory '8.GB'
Maximum amount of time that can be requested for any single job.
string
240.h
^(\d+\.?\s*(s|m|h|d|day)\s*)+$
Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. --max_time '2.h'
Less common options for the pipeline, typically set in a config file.
Display help text.
boolean
Display version and exit.
boolean
Method used to save pipeline results to output directory.
string
The Nextflow publishDir
option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.
Email address for completion summary, only when pipeline fails.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.
Send plain-text email instead of HTML.
boolean
File size limit when attaching MultiQC reports to summary emails.
string
25.MB
^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$
Do not use coloured log outputs.
boolean
Incoming hook URL for messaging service
string
Incoming hook URL for messaging service. Currently, MS Teams and Slack are supported.
Custom config file to supply to MultiQC.
string
Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file
string
Custom MultiQC yaml file containing HTML including a methods description.
string
Boolean whether to validate parameters against the schema at runtime
boolean
true
Use logo in initialise subworkflow
boolean
true
Show all params when using --help
boolean
By default, parameters set as hidden in the schema are not shown on the command line when a user runs with --help
. Specifying this option will tell the pipeline to show all parameters.
Validation of parameters fails when an unrecognised parameter is found.
boolean
By default, when an unrecognised parameter is found, it returns a warinig.
Validation of parameters in lenient more.
boolean
Allows string values that are parseable as numbers or booleans. For further information see JSONSchema docs.