phageannotator: Parameters

Define where the pipeline should find input data and save output data.

Path to comma-separated file containing information about the samples in the experiment.

required

type: string

pattern: ^\S+\.csv$

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

required

type: string

Email address for completion summary.

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

MultiQC report title. Printed as page header, used for filename if not otherwise specified.

type: string

Filter assemblies at the beginning of the workflow

Minimum assembly length

type: integer

default: 1000

Run ViromeQC to estimate viral enrichment

type: boolean

Identify reference viruses contained in reads

Run MASH screen to identify external viruses contained in reads

type: boolean

Path to FASTA file containing reference virus sequences

type: string

Path to mash sketch file for reference virus sequences

type: string

Save reference virus sketch, if it was created.

type: boolean

Minimum mash screen score to consider a genome contained

type: number

default: 0.95

Hashes present in multiple references are assigned only to top sequence

type: boolean

Classify viral sequences using geNomad

Skip running geNomad to classify viral/non-viral sequences

type: boolean

Path to directory containing geNomad’s database

type: string

Save geNomad’s database, if it was downloaded.

type: boolean

Minimum virus score for a sequence to be considered viral

type: number

default: 0.7

Maximum FDR for a sequence to be considered viral (will include —enable-score-calibration)

type: number

default: 0.1

Number of splits for running geNomad (more splits lowers memory requirements)

type: integer

default: 5

Extend viral contigs

Run COBRA to extend viral contigs

type: boolean

The assembler that was used to assemble viral contigs

type: string

Minimum kmer value used during assembly

type: string

Maximum kmer value used during assembly

type: string

Assess virus quality and filter

Skip running CheckV to assess virus quality and filter sequences

type: boolean

Path to directory containing CheckV database

type: string

Save CheckV’s database, if it was downloaded

type: boolean

Minimum virus length to pass filtering

type: integer

default: 3000

Minimum CheckV completeness to pass filtering

type: integer

default: 50

Remove viruses labeled as provirus by geNomad or CheckV

type: boolean

Remove viruses with CheckV warnings

type: boolean

Cluster virus genomes based on nucleotide/protein similarity

Skip ANI-based virus clustering

type: boolean

Minimum precent identity for BLAST hits

type: integer

default: 90

Maximum number of BLAST hits to record for each sequence

type: integer

default: 25000

Minimum average nucleotide identity (ANI) for sequences to be clustered together

type: integer

default: 95

Minimum query coverage for sequences to be clustered together

type: integer

Minimum test coverage for sequences to be clustered together

type: integer

default: 85

Align reads to virus database

Skip read alignment to viral sequences

type: boolean

Minimum length of reads aligned to references

type: integer

Minimum percent identity of aligned reads

type: integer

Minimum percent of read aligned to references

type: integer

Abundance calculation metrics

type: string

default: mean

Assign taxonomy to virus sequences

type: boolean

Predict host genus for phage sequences

Run iPHoP to predict phage hosts

type: boolean

Path to locally iPHoP database

type: string

Save downloaded iPHoP database

type: boolean

Minimum confidence score to provide host prediction

type: integer

default: 90

Predict the lifestyle of viral sequences

Run BACPHLIP to predict virus lifestyle

type: boolean

Functionally annotate viral genomes using a variety of approaches

Run pharokka to predict and annotate phage ORFs

type: boolean

Path to predownloaded pharokka db

type: string

Analyze virus diversity at the strain level

Bypass microdiversity analysis with inStrain

type: boolean

Minimum identity for read alignment to be considered

type: number

Minimum MAPQ for a read to be considered

type: integer

Minimum coverage for a variant to be considered

type: integer

Minimum allele frequency for an SNP to be considered

type: number

Maximum FDR for a SNP to be considered

type: integer

Minimum number of reads mapping to a genome to consider profiling

type: number

Minimum identity for genomes to be considered in the same strain

type: number

Minimum percent of genomes compared for comparison to be considered

type: number

Minimum breadth of coverage for a genome to be considered present

type: number

Arguments for running pipeline tests with custom arguments/databases.

hidden

type: boolean

hidden

type: number

hidden

type: boolean

Download test database rather than full database?

hidden

type: boolean

hidden

type: boolean

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden

type: string

default: master

Base directory for Institutional configs.

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/configs/master

Institutional config name.

hidden

type: string

Institutional config description.

hidden

type: string

Institutional config contact information.

hidden

type: string

Institutional config URL link.

hidden

type: string

Set the top limit for requested resources for any single job.

Maximum number of CPUs that can be requested for any single job.

hidden

type: integer

default: 16

Maximum amount of memory that can be requested for any single job.

hidden

type: string

default: 128.GB

pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Maximum amount of time that can be requested for any single job.

hidden

type: string

default: 240.h

pattern: ^(\d+\.?\s*(s|m|h|d|day)\s*)+$

Less common options for the pipeline, typically set in a config file.

Display help text.

hidden

type: boolean

Display version and exit.

hidden

type: boolean

Method used to save pipeline results to output directory.

hidden

type: string

Email address for completion summary, only when pipeline fails.

hidden

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Send plain-text email instead of HTML.

hidden

type: boolean

File size limit when attaching MultiQC reports to summary emails.

hidden

type: string

default: 25.MB

pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Do not use coloured log outputs.

hidden

type: boolean

Incoming hook URL for messaging service

hidden

type: string

Custom config file to supply to MultiQC.

hidden

type: string

Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file

hidden

type: string

Custom MultiQC yaml file containing HTML including a methods description.

type: string

Boolean whether to validate parameters against the schema at runtime

hidden

type: boolean

default: true

Use logo in initialise subworkflow

type: boolean

default: true

Show all params when using --help

hidden

type: boolean

Validation of parameters fails when an unrecognised parameter is found.

hidden

type: boolean

Validation of parameters in lenient more.

hidden

type: boolean

nf-core/phageannotator