nf-core/dualrnaseq
Analysis of Dual RNA-seq data - an experimental method for interrogating host-pathogen interactions through simultaneous RNA-seq.
22.10.6.
Learn more.
Primary parameters for runtime
Workflow name.
stringInput files can be read as either .fastq or .fastq.gz. They should be named descriptively without spaces and special characters (such as : and @), with the corresponding replicate (if any) appended at the end. The best practise for this pipeline is to use underscores to separate different experimental conditions.
stringdata/*{1,2}.fastq.gzSpecifies that the input is single-end reads.
booleanThe output directory where the results will be saved.
string./resultsSet the top limit for requested resources for any single job.
Maximum number of CPUs that can be requested for any single job.
integer16Maximum amount of memory that can be requested for any single job.
string128.GBMaximum amount of time that can be requested for any single job.
string240.hIf used, the path to the files should be enclosed by quotes ”../..”
Host fasta file
stringPathogen fasta file
stringHost GFF file
stringHost GFF file for tRNAs (optional)
stringPathogen GFF
stringHost transcriptome file
stringPathogen transcriptome file
stringIf supplying custom transcriptome files
booleanIf supplying custom transcriptome files
booleanName of host genome in the genomes.conf file
stringGRCh38Name of host genome in the genomes.conf file
stringSL1344booleanBy default, the pipeline utilizes FastQC tool for quality control of raw sequencing reads
An option to not run FastQC. (Default: False) This is set to False within the configuration files, but only needs to be passed on the command line to become True.
booleanDefine a set of additional fastqc parameters you wish to use, except —quiet —threads —noextract flags which are already specified in the dualrnaseq pipeline
stringAdapter and read trimming is performed by either Cutadapt or BBDuk with the following related options
To run Cutadapt
booleanAdaptor for For single-end reads as well as the first reads of paired-end data
stringAGATCGGAAGAGCACACGTCTGAACTCCAGTCAFor paired-end data, the adapter sequence for the second reads can be defined here
stringAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTCutadapt can also remove low-quality read ends. By default, the 3\u2019 end of each read is trimmed using a cutoff of 10. If you specify two comma-separated cutoffs, the first value represents the 5’ cutoff, and the second one the 3’ cutoff
integer10Additional parameters if needed
stringAdapter and read trimming is performed by either Cutadapt or BBDuk with the following related options
To run BBDuk
booleanReads shorter than this after trimming will be discarded
integer18To trim read ends to remove bases with quality below trimq
stringrCutoff to trim regions with average quality BELOW given value
integer10To trim reads to remove bases matching reference kmers. Avaiable options: f (don’t trim), r (trim to the right - 3’ adapters) l (trim to the left - 5’ adapters)
stringrKmer length used for finding contaminants (adapters). Contaminants shorter than k will not be found. k must be at least 1
integer17Look for shorter kmers at read tips down to this length when k-trimming or masking. 0 means disabled. Enabling this will disable maskmiddle
integer11Maximum Hamming distance for ref kmers (subs only)
integer1Fasta file with adapter sequences (Default: $baseDir/data/adapters.fa)
stringdata/adapters.faSet of additional BBDuk parameters
stringThese parameters are available for Salmon in both Selective Alignment and alignment-based mode
Options for setting the library type. A = automatic detection
stringBy default, this is set to 0.0, to ensure that only mappings or alignments that are compatible with the specified library type are considered by Salmon
integerOption to extract all of the unique and ambiguous reads after quantification
booleanThe pipeline uses gene features from the 3rd column of the host annotative file (gff3) to extract the coordinates of transcripts to be quantified. By default, the pipeline useanscriptome_hosts exon from the —gff_host file and tRNA from the —gff_host_tRNA file
string['exon', 'tRNA']The pipeline uses gene features from the 3rd column of the pathogen annotative fikle (gff3) to extract the coordinates of transcripts to be quantified. By default, the pipeline uses features as gene, sRNA, tRNA and rRNA from the —gff_pathogen file.
string['gene', 'sRNA', 'tRNA', 'rRNA']This flag defines the gene attribute from the 9th column of the host annotative (gff3) file, where the transcript names are extracted. By default, the pipeline extracts transcript_id from the —gff_host file
stringtranscript_idThis flag defines the gene attribute from the 9th column of the pathogen annotative (gff3) file, where transcript, genes or CDS regions are extracted. By default, the pipeline extracts locus_tag from the —gff_pathogen file
stringlocus_tagParameters listed below are available only for Salmon with Selective Alignment.
Run Salmon selective alignment
booleanTo define the k-mer length (-k parameter in Salmon)
integer21By default the pipeline saves names of unmapped reads
booleanBy default, the pipeline allows soft-clipping of reads
booleano save the equivalence classes and their counts
booleanset to True, the pipeline will create a mapping.sam file containing mapping information
booleanBy default salmon removes/collapses identical transcripts during the indexing stage
booleanSet of additional parameters for creating an index with Salmon Selective Alignment
stringSet of additional parameters for mapping with Salmon Selective Alignment
stringOptions for Alignment-based mode
To run Salmon alignment-based mode
booleanDefine a set of additional salmon quant parameters you wish to use in salmon alignment-based mode.
stringThese parameters are available for STAR in both quantification modes, using HTSeq and Salmon in alignment-based mode
To run STAR
booleanBy default, the pipeline saves unmapped reads within the main BAM file. If you want to switch off this option, set the —outSAMunmapped flag to None
stringWithino specify the attributes of the output BAM file. The default value is Standard, but there are a range of options if needed
stringStandardTo specify the maximum number of loci a read is allowed to map to
integer999By default, the pipeline keeps reads containing junctions that passed filtering into the file SJ.out.tab. This option reduces the number of ”spurious” junctions
stringBySJoutThe number of minimum overhang for unannotated junctions can be changed here
integer8The number of minimum overhang for annotated junctions can be changed here
integer1To define a threshold for the number of mismatches to be allowed. By default, the pipeline uses a large number 999 to switch this filter off.
integer999Here, you can define a threshold for a ratio of mismatches to read length. The alignment will be considered if the ratio is less than or equal to this value
integer1By default, the nf-core dualrnaseq pipeline uses 20 as a minimum intron length. If the genomic gap is smaller than this value, it is considered as a deletion
integer20The maximum intron length is set to 1,000,000
integer1000000The maximum genomic distance between mates is 1,000,000
integer1000000Option to limit RAM when sorting BAM file. If 0, will be set to the genome index size, which can be quite large when running on a desktop or laptop
integer0The maximum number of loci anchors that are allowed to map. By default, the pipeline uses a large number 999 to switch this filter off.
integer999Option to specify the length of the donor/acceptor sequence on each side of the junctions used in constructing the splice junctions database. By default the option is set to 100. However, we recommend setting a value depending on the read length: read/mate length - 1.
integer100The nf-core/dualrnaseq pipeline runs STAR to generate transcriptomic alignments. By default, it allows for insertions, deletions and soft-clips (Singleend option). To prohibit this behaviour, please specify IndelSoftclipSingleend
stringSingleendDefine additional parameters for creating an index with STAR in salmon
stringDefine additional parameters for alignment with STAR in salmon alignment-based mode
stringParameters available for STAR - HTSeq
Used to generate signal outputs, such as “wiggle” and “bedGraph”.
stringNoneOptions are Stranded or Unstranded when defining the strandedness of wiggle/bedGraph output
stringStrandedSet of additional parameters for creating an index with STAR
stringSet of additional parameters for alignment with STAR
stringGeneral parameters
Used to run HTSeq-count and extract uniquely mapped reads from both the host and pathogen
booleanA parameter for the library type. Options include “yes”, “no” or “reverse”
stringyesOption to define the number of maximum reads allowed to stay in memory until the mates are found. Has an effect for paired-end reads
integer30000000To specify a threshold for a minimal MAPQ alignment quality
integer10Set of additional parameters for HTSeq
stringHost - gene feature to quantify
string['exon', 'tRNA']Host - GFF attribute
stringgene_idPathogen - gene feature to quantify (will likely need to be modified)
string['gene', 'sRNA', 'tRNA', 'rRNA']Pathogen - GFF attribute (Will likely need to be modified)
stringlocus_tagOption to generate mapping statistics, creating plots and summaries
booleanTab delimited file contains headers which groups similar types of RNA classes together. This helps to keep the RNA-class names simplified for plotting purposes
string{base_dir}/data/RNA_classes_to_replace.csvLess common options for the pipeline, typically set in a config file.
Method used to save pipeline results to output directory (please don’t change).
stringEmail address for completion summary, only when pipeline fails.
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$Send plain-text email instead of HTML.
booleanFile size limit when attaching MultiQC reports to summary emails.
string25.MBDo not use coloured log outputs.
booleanCustom config file to supply to MultiQC.
stringDirectory to keep pipeline Nextflow logs and reports.
string${params.outdir}/pipeline_infoDisplay help text.
booleanEmail address for completion summary.
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$Parameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
stringmasterBase directory for Institutional configs.
stringhttps://raw.githubusercontent.com/nf-core/configs/masterInstitutional configs hostname.
stringInstitutional config description.
stringInstitutional config contact information.
stringInstitutional config URL link.
string