dualrnaseq: Parameters

Primary parameters for runtime

Workflow name.

type: string

Input files can be read as either .fastq or .fastq.gz. They should be named descriptively without spaces and special characters (such as : and @), with the corresponding replicate (if any) appended at the end. The best practise for this pipeline is to use underscores to separate different experimental conditions.

required

type: string

default: data/*{1,2}.fastq.gz

Specifies that the input is single-end reads.

type: boolean

The output directory where the results will be saved.

type: string

default: ./results

Set the top limit for requested resources for any single job.

Maximum number of CPUs that can be requested for any single job.

hidden

type: integer

default: 16

Maximum amount of memory that can be requested for any single job.

hidden

type: string

default: 128.GB

Maximum amount of time that can be requested for any single job.

hidden

type: string

default: 240.h

If used, the path to the files should be enclosed by quotes ”../..”

Host fasta file

type: string

Pathogen fasta file

type: string

Host GFF file

type: string

Host GFF file for tRNAs (optional)

type: string

Pathogen GFF

type: string

Host transcriptome file

type: string

Pathogen transcriptome file

type: string

If supplying custom transcriptome files

type: boolean

If supplying custom transcriptome files

type: boolean

Name of host genome in the genomes.conf file

type: string

default: GRCh38

Name of host genome in the genomes.conf file

type: string

default: SL1344

type: boolean

By default, the pipeline utilizes FastQC tool for quality control of raw sequencing reads

An option to not run FastQC. (Default: False) This is set to False within the configuration files, but only needs to be passed on the command line to become True.

type: boolean

Define a set of additional fastqc parameters you wish to use, except —quiet —threads —noextract flags which are already specified in the dualrnaseq pipeline

type: string

Adapter and read trimming is performed by either Cutadapt or BBDuk with the following related options

To run Cutadapt

type: boolean

Adaptor for For single-end reads as well as the first reads of paired-end data

type: string

default: AGATCGGAAGAGCACACGTCTGAACTCCAGTCA

For paired-end data, the adapter sequence for the second reads can be defined here

type: string

default: AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

Cutadapt can also remove low-quality read ends. By default, the 3\u2019 end of each read is trimmed using a cutoff of 10. If you specify two comma-separated cutoffs, the first value represents the 5’ cutoff, and the second one the 3’ cutoff

type: integer

default: 10

Additional parameters if needed

type: string

Adapter and read trimming is performed by either Cutadapt or BBDuk with the following related options

To run BBDuk

type: boolean

Reads shorter than this after trimming will be discarded

type: integer

default: 18

To trim read ends to remove bases with quality below trimq

type: string

default: r

Cutoff to trim regions with average quality BELOW given value

type: integer

default: 10

To trim reads to remove bases matching reference kmers. Avaiable options: f (don’t trim), r (trim to the right - 3’ adapters) l (trim to the left - 5’ adapters)

type: string

default: r

Kmer length used for finding contaminants (adapters). Contaminants shorter than k will not be found. k must be at least 1

type: integer

default: 17

Look for shorter kmers at read tips down to this length when k-trimming or masking. 0 means disabled. Enabling this will disable maskmiddle

type: integer

default: 11

Maximum Hamming distance for ref kmers (subs only)

type: integer

default: 1

Fasta file with adapter sequences (Default: $baseDir/data/adapters.fa)

type: string

default: data/adapters.fa

Set of additional BBDuk parameters

type: string

These parameters are available for Salmon in both Selective Alignment and alignment-based mode

Options for setting the library type. A = automatic detection

type: string

By default, this is set to 0.0, to ensure that only mappings or alignments that are compatible with the specified library type are considered by Salmon

type: integer

Option to extract all of the unique and ambiguous reads after quantification

type: boolean

The pipeline uses gene features from the 3rd column of the host annotative file (gff3) to extract the coordinates of transcripts to be quantified. By default, the pipeline useanscriptome_hosts exon from the —gff_host file and tRNA from the —gff_host_tRNA file

type: string

default: ['exon', 'tRNA']

The pipeline uses gene features from the 3rd column of the pathogen annotative fikle (gff3) to extract the coordinates of transcripts to be quantified. By default, the pipeline uses features as gene, sRNA, tRNA and rRNA from the —gff_pathogen file.

type: string

default: ['gene', 'sRNA', 'tRNA', 'rRNA']

This flag defines the gene attribute from the 9th column of the host annotative (gff3) file, where the transcript names are extracted. By default, the pipeline extracts transcript_id from the —gff_host file

type: string

default: transcript_id

This flag defines the gene attribute from the 9th column of the pathogen annotative (gff3) file, where transcript, genes or CDS regions are extracted. By default, the pipeline extracts locus_tag from the —gff_pathogen file

type: string

default: locus_tag

Parameters listed below are available only for Salmon with Selective Alignment.

Run Salmon selective alignment

type: boolean

To define the k-mer length (-k parameter in Salmon)

type: integer

default: 21

By default the pipeline saves names of unmapped reads

type: boolean

By default, the pipeline allows soft-clipping of reads

type: boolean

o save the equivalence classes and their counts

type: boolean

set to True, the pipeline will create a mapping.sam file containing mapping information

type: boolean

By default salmon removes/collapses identical transcripts during the indexing stage

type: boolean

Set of additional parameters for creating an index with Salmon Selective Alignment

type: string

Set of additional parameters for mapping with Salmon Selective Alignment

type: string

Options for Alignment-based mode

To run Salmon alignment-based mode

type: boolean

Define a set of additional salmon quant parameters you wish to use in salmon alignment-based mode.

type: string

These parameters are available for STAR in both quantification modes, using HTSeq and Salmon in alignment-based mode

To run STAR

type: boolean

By default, the pipeline saves unmapped reads within the main BAM file. If you want to switch off this option, set the —outSAMunmapped flag to None

type: string

default: Within

o specify the attributes of the output BAM file. The default value is Standard, but there are a range of options if needed

type: string

default: Standard

To specify the maximum number of loci a read is allowed to map to

type: integer

default: 999

By default, the pipeline keeps reads containing junctions that passed filtering into the file SJ.out.tab. This option reduces the number of ”spurious” junctions

type: string

default: BySJout

The number of minimum overhang for unannotated junctions can be changed here

type: integer

default: 8

The number of minimum overhang for annotated junctions can be changed here

type: integer

default: 1

To define a threshold for the number of mismatches to be allowed. By default, the pipeline uses a large number 999 to switch this filter off.

type: integer

default: 999

Here, you can define a threshold for a ratio of mismatches to read length. The alignment will be considered if the ratio is less than or equal to this value

type: integer

default: 1

By default, the nf-core dualrnaseq pipeline uses 20 as a minimum intron length. If the genomic gap is smaller than this value, it is considered as a deletion

type: integer

default: 20

The maximum intron length is set to 1,000,000

type: integer

default: 1000000

The maximum genomic distance between mates is 1,000,000

type: integer

default: 1000000

Option to limit RAM when sorting BAM file. If 0, will be set to the genome index size, which can be quite large when running on a desktop or laptop

type: integer

default: 0

The maximum number of loci anchors that are allowed to map. By default, the pipeline uses a large number 999 to switch this filter off.

type: integer

default: 999

Option to specify the length of the donor/acceptor sequence on each side of the junctions used in constructing the splice junctions database. By default the option is set to 100. However, we recommend setting a value depending on the read length: read/mate length - 1.

type: integer

default: 100

The nf-core/dualrnaseq pipeline runs STAR to generate transcriptomic alignments. By default, it allows for insertions, deletions and soft-clips (Singleend option). To prohibit this behaviour, please specify IndelSoftclipSingleend

type: string

default: Singleend

Define additional parameters for creating an index with STAR in salmon

type: string

Define additional parameters for alignment with STAR in salmon alignment-based mode

type: string

Parameters available for STAR - HTSeq

Used to generate signal outputs, such as “wiggle” and “bedGraph”.

type: string

default: None

Options are Stranded or Unstranded when defining the strandedness of wiggle/bedGraph output

type: string

default: Stranded

Set of additional parameters for creating an index with STAR

type: string

Set of additional parameters for alignment with STAR

type: string

General parameters

Used to run HTSeq-count and extract uniquely mapped reads from both the host and pathogen

type: boolean

A parameter for the library type. Options include “yes”, “no” or “reverse”

type: string

default: yes

Option to define the number of maximum reads allowed to stay in memory until the mates are found. Has an effect for paired-end reads

type: integer

default: 30000000

To specify a threshold for a minimal MAPQ alignment quality

type: integer

default: 10

Set of additional parameters for HTSeq

type: string

Host - gene feature to quantify

type: string

default: ['exon', 'tRNA']

Host - GFF attribute

type: string

default: gene_id

Pathogen - gene feature to quantify (will likely need to be modified)

type: string

default: ['gene', 'sRNA', 'tRNA', 'rRNA']

Pathogen - GFF attribute (Will likely need to be modified)

type: string

default: locus_tag

Option to generate mapping statistics, creating plots and summaries

type: boolean

Tab delimited file contains headers which groups similar types of RNA classes together. This helps to keep the RNA-class names simplified for plotting purposes

type: string

default: {base_dir}/data/RNA_classes_to_replace.csv

Less common options for the pipeline, typically set in a config file.

Method used to save pipeline results to output directory (please don’t change).

hidden

type: string

Email address for completion summary, only when pipeline fails.

hidden

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Send plain-text email instead of HTML.

hidden

type: boolean

File size limit when attaching MultiQC reports to summary emails.

hidden

type: string

default: 25.MB

Do not use coloured log outputs.

hidden

type: boolean

Custom config file to supply to MultiQC.

hidden

type: string

Directory to keep pipeline Nextflow logs and reports.

hidden

type: string

default: ${params.outdir}/pipeline_info

Display help text.

hidden

type: boolean

Email address for completion summary.

hidden

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden

type: string

default: master

Base directory for Institutional configs.

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/configs/master

Institutional configs hostname.

hidden

type: string

Institutional config description.

hidden

type: string

Institutional config contact information.

hidden

type: string

Institutional config URL link.

hidden

type: string