nf-core/dualrnaseq
Analysis of Dual RNA-seq data - an experimental method for interrogating host-pathogen interactions through simultaneous RNA-seq.
22.10.6
.
Learn more.
Primary parameters for runtime
Workflow name.
string
A custom name for the pipeline run. Unlike the core nextflow -name
option with one hyphen this parameter can be reused multiple times, for example if using -resume
. Passed through to steps such as MultiQC and used for things like report filenames and titles.
Input files can be read as either .fastq or .fastq.gz. They should be named descriptively without spaces and special characters (such as : and @), with the corresponding replicate (if any) appended at the end. The best practise for this pipeline is to use underscores to separate different experimental conditions.
string
data/*{1,2}.fastq.gz
Use this to specify the location of your input FastQ files. For example:
--input 'path/to/data/sample_*_{1,2}.fastq'
Please note the following requirements:
- The path must be enclosed in quotes
- The path must have at least one
*
wildcard character - When using the pipeline with paired end data, the path must use
{1,2}
notation to specify read pairs. - If left unspecified, a default pattern is used:
data/*{1,2}.fastq.gz
Note: by default, the pipeline expects paired-end data. If you have single-end data, you need to specify --single_end
on the command line when launched. For example: --single_end --input '*.fastq'
Specifies that the input is single-end reads.
boolean
By default, the pipeline expects paired-end data. If you have single-end data, you need to specify --single_end
on the command line when you launch the pipeline. A normal glob pattern, enclosed in quotation marks, can then be used for --input
. For example:
--single_end --input '*.fastq'
It is not possible to run a mixture of single-end and paired-end files in one run.
The output directory where the results will be saved.
string
./results
Set the top limit for requested resources for any single job.
Maximum number of CPUs that can be requested for any single job.
integer
16
Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. --max_cpus 1
Maximum amount of memory that can be requested for any single job.
string
128.GB
Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. --max_memory '8.GB'
Maximum amount of time that can be requested for any single job.
string
240.h
Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. --max_time '2.h'
If used, the path to the files should be enclosed by quotes "../.."
Host fasta file
string
Pathogen fasta file
string
Host GFF file
string
Host GFF file for tRNAs (optional)
string
Pathogen GFF
string
Host transcriptome file
string
Pathogen transcriptome file
string
If supplying custom transcriptome files
boolean
If supplying custom transcriptome files
boolean
Name of host genome in the genomes.conf file
string
GRCh38
Name of host genome in the genomes.conf file
string
SL1344
boolean
By default, the pipeline utilizes FastQC tool for quality control of raw sequencing reads
An option to not run FastQC. (Default: False) This is set to False within the configuration files, but only needs to be passed on the command line to become True.
boolean
Define a set of additional fastqc parameters you wish to use, except --quiet --threads --noextract flags which are already specified in the dualrnaseq pipeline
string
Adapter and read trimming is performed by either Cutadapt or BBDuk with the following related options
To run Cutadapt
boolean
Adaptor for For single-end reads as well as the first reads of paired-end data
string
AGATCGGAAGAGCACACGTCTGAACTCCAGTCA
For paired-end data, the adapter sequence for the second reads can be defined here
string
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT
Cutadapt can also remove low-quality read ends. By default, the 3\u2019 end of each read is trimmed using a cutoff of 10. If you specify two comma-separated cutoffs, the first value represents the 5' cutoff, and the second one the 3' cutoff
integer
10
Additional parameters if needed
string
Define a set of additional Cutadapt parameters you wish to use, except -m and -j which are already specified in the dualrnaseq pipeline.
Adapter and read trimming is performed by either Cutadapt or BBDuk with the following related options
To run BBDuk
boolean
Reads shorter than this after trimming will be discarded
integer
18
To trim read ends to remove bases with quality below trimq
string
r
Cutoff to trim regions with average quality BELOW given value
integer
10
To trim reads to remove bases matching reference kmers. Avaiable options: f (don't trim), r (trim to the right - 3' adapters) l (trim to the left - 5' adapters)
string
r
Kmer length used for finding contaminants (adapters). Contaminants shorter than k will not be found. k must be at least 1
integer
17
Look for shorter kmers at read tips down to this length when k-trimming or masking. 0 means disabled. Enabling this will disable maskmiddle
integer
11
Maximum Hamming distance for ref kmers (subs only)
integer
1
Fasta file with adapter sequences (Default: $baseDir/data/adapters.fa)
string
data/adapters.fa
Set of additional BBDuk parameters
string
Define a set of additional BBDuk parameters you wish to use, except -Xmx1g
which is already specified in the dualrnaseq pipeline.
These parameters are available for Salmon in both Selective Alignment and alignment-based mode
Options for setting the library type. A = automatic detection
string
By default, this is set to 0.0, to ensure that only mappings or alignments that are compatible with the specified library type are considered by Salmon
integer
Option to extract all of the unique and ambiguous reads after quantification
boolean
This is useful to analyse reads which multimap across or within genomes. This option merges the quant.sf
file with the aux_info/ambig_info.tsv
file, combining columns which show how the underlying reads were processed and assigned. If a read maps uniquely to a feature, then the read will be added to UniqueCount column. If the read maps to more than one location, it will be summed against each of the features and shown in the AmbigCount column. The underlying statistical model of Salmon is able to assign many of these multimapping reads to a specific feature and hence will appear in the NumReads column. The output file is located under the aux_info
folder.
Works for both Selective alignment and alignment-based modes (Default: False).
The pipeline uses gene features from the 3rd column of the host annotative file (gff3) to extract the coordinates of transcripts to be quantified. By default, the pipeline useanscriptome_hosts exon from the --gff_host file and tRNA from the --gff_host_tRNA file
string
['exon', 'tRNA']
The pipeline uses gene features from the 3rd column of the pathogen annotative fikle (gff3) to extract the coordinates of transcripts to be quantified. By default, the pipeline uses features as gene, sRNA, tRNA and rRNA from the --gff_pathogen file.
string
['gene', 'sRNA', 'tRNA', 'rRNA']
This flag defines the gene attribute from the 9th column of the host annotative (gff3) file, where the transcript names are extracted. By default, the pipeline extracts transcript_id from the --gff_host file
string
transcript_id
This flag defines the gene attribute from the 9th column of the pathogen annotative (gff3) file, where transcript, genes or CDS regions are extracted. By default, the pipeline extracts locus_tag from the --gff_pathogen file
string
locus_tag
Parameters listed below are available only for Salmon with Selective Alignment.
Run Salmon selective alignment
boolean
To define the k-mer length (-k parameter in Salmon)
integer
21
By default the pipeline saves names of unmapped reads
boolean
By default, the pipeline allows soft-clipping of reads
boolean
"Soft-clipping allows reads that overhang the beginning or ends of the transcript. In this case, the overhanging section of the read will simply be unaligned, and will not contribute or detract from the alignment score". If it is set to False, the end-to-end alignment of the entire read is forced, so that the occurrence of any overhangings may affect the alignment score
o save the equivalence classes and their counts
boolean
set to True, the pipeline will create a mapping.sam file containing mapping information
boolean
By default salmon removes/collapses identical transcripts during the indexing stage
boolean
The list of both restored and removed transcripts will be saved in the duplicate_clusters.tsv
file of the transcripts_index
folder. If you want to obtain quantification results for all duplicates, please specify this option --keepDuplicates
Set of additional parameters for creating an index with Salmon Selective Alignment
string
Set of additional parameters for mapping with Salmon Selective Alignment
string
Options for Alignment-based mode
To run Salmon alignment-based mode
boolean
Define a set of additional salmon quant parameters you wish to use in salmon alignment-based mode.
string
These parameters are available for STAR in both quantification modes, using HTSeq and Salmon in alignment-based mode
To run STAR
boolean
By default, the pipeline saves unmapped reads within the main BAM file. If you want to switch off this option, set the --outSAMunmapped flag to None
string
Within
For paired-end reads, the KeepPairs parameter will record the unmapped mates for each alignment, and will keep it adjacent to its mapped read (only affects multi-mapping reads).
o specify the attributes of the output BAM file. The default value is Standard, but there are a range of options if needed
string
Standard
By default, the pipeline uses the Standard option to keep NH HI AS nM SAM attributes
To specify the maximum number of loci a read is allowed to map to
integer
999
By default, the pipeline keeps reads containing junctions that passed filtering into the file SJ.out.tab. This option reduces the number of ”spurious” junctions
string
BySJout
The number of minimum overhang for unannotated junctions can be changed here
integer
8
The number of minimum overhang for annotated junctions can be changed here
integer
1
To define a threshold for the number of mismatches to be allowed. By default, the pipeline uses a large number 999 to switch this filter off.
integer
999
Here, you can define a threshold for a ratio of mismatches to read length. The alignment will be considered if the ratio is less than or equal to this value
integer
1
By default, the nf-core dualrnaseq pipeline uses 20 as a minimum intron length. If the genomic gap is smaller than this value, it is considered as a deletion
integer
20
The maximum intron length is set to 1,000,000
integer
1000000
The maximum genomic distance between mates is 1,000,000
integer
1000000
Option to limit RAM when sorting BAM file. If 0, will be set to the genome index size, which can be quite large when running on a desktop or laptop
integer
0
The maximum number of loci anchors that are allowed to map. By default, the pipeline uses a large number 999 to switch this filter off.
integer
999
Option to specify the length of the donor/acceptor sequence on each side of the junctions used in constructing the splice junctions database. By default the option is set to 100. However, we recommend setting a value depending on the read length: read/mate length - 1.
integer
100
The nf-core/dualrnaseq pipeline runs STAR to generate transcriptomic alignments. By default, it allows for insertions, deletions and soft-clips (Singleend option). To prohibit this behaviour, please specify IndelSoftclipSingleend
string
Singleend
Define additional parameters for creating an index with STAR in salmon
string
Define additional parameters for alignment with STAR in salmon alignment-based mode
string
Parameters available for STAR - HTSeq
Used to generate signal outputs, such as "wiggle" and "bedGraph".
string
None
Options are Stranded or Unstranded when defining the strandedness of wiggle/bedGraph output
string
Stranded
Set of additional parameters for creating an index with STAR
string
Set of additional parameters for alignment with STAR
string
General parameters
Used to run HTSeq-count and extract uniquely mapped reads from both the host and pathogen
boolean
A parameter for the library type. Options include "yes", "no" or "reverse"
string
yes
Option to define the number of maximum reads allowed to stay in memory until the mates are found. Has an effect for paired-end reads
integer
30000000
To specify a threshold for a minimal MAPQ alignment quality
integer
10
Set of additional parameters for HTSeq
string
Host - gene feature to quantify
string
['exon', 'tRNA']
Host - GFF attribute
string
gene_id
Pathogen - gene feature to quantify (will likely need to be modified)
string
['gene', 'sRNA', 'tRNA', 'rRNA']
Pathogen - GFF attribute (Will likely need to be modified)
string
locus_tag
Option to generate mapping statistics, creating plots and summaries
boolean
This will create the following:
- Count the total number of reads before and after trimming
- Scatterplots comparing all replicates (separate for both host and pathogen reads)
- Plots of the % of mapped/quantified reads
- Plots of RNA-class statistics (as many types can be identified, the parameter below
--RNA_classes_to_replace_host
can help to summarise these)
Tab delimited file contains headers which groups similar types of RNA classes together. This helps to keep the RNA-class names simplified for plotting purposes
string
{base_dir}/data/RNA_classes_to_replace.csv
Initially, the user can run the pipeline without the 'others' class (remove the 'others' column) to identify the concentration of all RNA types,including e.g. scRNAs). Depending on the requirements, the user can decide which types should be included/excluded or grouped together.
Less common options for the pipeline, typically set in a config file.
Method used to save pipeline results to output directory (please don't change).
string
The Nextflow publishDir
option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.
Email address for completion summary, only when pipeline fails.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
This works exactly as with --email
, except emails are only sent if the workflow is not successful.
Send plain-text email instead of HTML.
boolean
Set to receive plain-text e-mails instead of HTML formatted.
File size limit when attaching MultiQC reports to summary emails.
string
25.MB
If file generated by pipeline exceeds the threshold, it will not be attached.
Do not use coloured log outputs.
boolean
Set to disable colourful command line output and live life in monochrome.
Custom config file to supply to MultiQC.
string
Directory to keep pipeline Nextflow logs and reports.
string
${params.outdir}/pipeline_info
Display help text.
boolean
Email address for completion summary.
string
^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$
Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (~/.nextflow/config
) then you don't need to specify this on the command line for every run.
Parameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
string
master
Provide git commit id for custom Institutional configs hosted at nf-core/configs
. This was implemented for reproducibility purposes. Default: master
.
## Download and use config file with following git commit id
--custom_config_version d52db660777c4bf36546ddb188ec530c3ada1b96
Base directory for Institutional configs.
string
https://raw.githubusercontent.com/nf-core/configs/master
If you're running offline, nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell nextflow where to find them with the custom_config_base
option. For example:
## Download and unzip the config files
cd /path/to/my/configs
wget https://github.com/nf-core/configs/archive/master.zip
unzip master.zip
## Run the pipeline
cd /path/to/my/data
nextflow run /path/to/pipeline/ --custom_config_base /path/to/my/configs/configs-master/
Note that the nf-core/tools helper package has a
download
command to download all required pipeline files + singularity containers + institutional configs in one go for you, to make this process easier.
Institutional configs hostname.
string
Institutional config description.
string
Institutional config contact information.
string
Institutional config URL link.
string