Define where the pipeline should find input data and save output data.

Specify the subworkflow to be executed.

hidden
type: string

Path to a tsv file providing paths to the fastq files for each sample and the necessary metadata for the analysis.

required
type: string

The input file includes important sample metadata and the path to the R1 and R2 fastq files, and index read file (I), if available. The file should include the following columns, separated with tabs, with exactly these header names:

ID Source Treatment Extraction_time Population R1 R2 I1  
QMKMK072AD Patient_2 Drug_treatment baseline p sample_S8_L001_R1_001.fastq.gz sample_S8_L001_R2_001.fastq.gz sample_S8_L001_I1_001.fastq.gz  

This metadata will then be automatically annotated in a column with the same header in the tables outputed by the pipeline. Where:

  • ID: sample ID.
  • Source: patient or organism code.
  • Treatment: treatment condition applied to the sample.
  • Extraction_time: time of cell extraction for the sample.
  • Population: B-cell population (e.g. naive, double-negative, memory, plasmablast).
  • R1: path to fastq file with first mates of paired-end sequencing.
  • R2: path to fastq file with second mates of paired-end sequencing.
  • I1: path to fastq with illumina index and UMI (unique molecular identifier) barcode (optional column)

Specify the path for your input file like this:

--input 'path/to/metadata/metadata_sheet.tsv'  

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

required
type: string

Email address for completion summary.

type: string
pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (~/.nextflow/config) then you don't need to specify this on the command line for every run.

Experimental protocol used to generate the data

Protocol used for the V(D)J amplicon sequencing library generation.

type: string

Available protocols are:

  • specific_pcr_umi: RT-PCR using transcript-specific primers containing UMIs.
  • specific_pcr: RT-PCR using transcript-specific primers.
  • dt_5p_race_umi: 5’-RACE PCR using oligo-dT primers and template switch primers containing UMI.
  • dt_5p_race: 5’-RACE PCR (i.e. RT is followed by a template switch (TS) step) using oligo-dT primers.

Path to fasta file containing the linker sequence, if no V-region primers were used but a linker sequence is present (e.g. 5' RACE SMARTer TAKARA protocol).

type: string

Define the paths to the igblast and IMGT databases if you have them cached.

Path to the cached igblast database.

type: string

If it is not provided, the database will be newly downloaded.

Path to the cached igblast database.

type: string

If it is not provided, the database will be newly downloaded.

Save databases so you can use the cache in future runs.

type: boolean

Define the primer region start and how to deal with the primer alignment.

Path to a fasta file containinc the V-region primer sequences.

type: string

Path to a fasta file containing the C-region primer sequences.

type: string

Start position of V region primers (without counting the UMI barcode).

type: integer

Start position of C region primers (without counting the UMI barcode).

type: integer

Indicate if C region primers are in the R1 or R2 reads.

type: string

Specify to match the tail-end of the sequence against the reverse complement of the primers. This also reverses the behavior of the --start argument, such that start position is relative to the tail-end of the sequence. (default: False)Maximum scoring error for the Presto MaxPrimer process for the C and/or V region primers identification.

type: boolean

Define how UMI barcodes should be treated.

Indicate if UMI indices are recorded in a separate index file.

type: boolean

Set to true if UMI barcodes are to be read from a separate illumina index fastq file. If Illumina indices and UMI barcodes are already integrated into the R1 reads, leave the default --index_file false.

The pipeline requires UMI barcodes for identifying unique transcripts. These barcodes are typically read from an index file but sometimes can be provided merged with the start of the R1 or R2 reads. If provided in an additional index file, set the --index_file parameter, if provided merged with the R1 or R2 reads, set the --umi_position parameter.

Indicate if UMI indices are recorded in the R1 (default) or R1 fastq file.

type: string

The pipeline requires UMI barcodes for identifying unique transcripts. These barcodes are typically read from an index file but sometimes can be provided merged with the start of the R1 or R2 reads. If provided in an additional index file, set the --index_file parameter, if provided merged with the R1 or R2 reads, set the --umi_position parameter to R1 or R2, respectively.

UMI barcode length in nucleotides. Set to 0 if no UMIs present.

type: integer
default: -1

UMI barcode start position in the index read.

type: integer

Options for the presto tools

Quality threshold for Presto FilterSeq sequence filtering.

type: integer
default: 20

Maximum primer scoring error in the Presto MaskPrimer step for the C and/or V region primers identification.

type: number
default: 0.2

Maximum error for building the primer consensus in the Presto Buildconsensus step.

type: number
default: 0.6

Masking mode for the Presto MaskPrimer step. Available: cut, mask, trim, tag.

type: string

The primer masking modes will perform the following actions:

  • cut: remove both the primer region and the preceding sequence.
  • mask: replace the primer region with Ns and remove the preceding sequence.
  • trim: remove the region preceding the primer, but leave the primer region intact.
  • tag: leave the input sequence unmodified.

Maximum error for building the sequence consensus in the Presto BuildConsensus step.

type: number
default: 0.1

Maximum gap for building the sequence consensus in the Presto BuildConsensus step.

type: number
default: 0.5

Cluster sequences by similarity regardless of any annotation with Presto ClusterSets and annotate the cluster ID additionally to the UMI barcode.

type: boolean
default: true

Define how the B-cell clonal trees should be calculated.

Set to true if to manually adjust the clustering threshold for cell clones.

type: boolean

Set the --set_cluster_threshold parameter to allow manual cluster hamming distance threshold definition. Then specify the value in the --cluster_threshold parameter.

By default, the pipeline will define clones for each of the samples, as two sequences having the same V gene assignment, C gene assignment, J-gene assignment, and junction length. Additionally, the similarity of the junction region sequences will be assessed by hamming distances. A distance threshold for determining if two sequences come from the same clone is automatically determined by the process shazam. Alternatively, a hamming distance threshold can be manually set by setting the --set_cluster_threshold and --cluster_threshold parameters.

Set the clustering threshold Hamming distance value.

type: number
default: 0.14

To have any effect, the --set_cluster_threshold parameter needs to be set to true.

By default, the pipeline will define clones for each of the samples, as two sequences having the same V gene assignment, C gene assignment, J-gene assignment, and junction length. Additionally, the similarity of the junction region sequences will be assessed by hamming distances. A distance threshold for determining if two sequences come from the same clone is automatically determined by the process shazam. Alternatively, a hamming distance threshold can be manually set by setting the --set_cluster_threshold and --cluster_threshold parameters.

Set the method for finding the clustering threshold.

type: string
default: density

This method will be used to find the Hamming nearest neighbor distances threshold for determining if a sequence belongs to the same B/T-cell clone or not. Available methods are "gmm" for a maximum-likelihood Gamma or Gaussian mixture fitting, and "density" for fitting a binned approximation to the ordinary kernel density estimate to the nearest neighbor distances.

Define downstream analysis options.

Skip repertoire analysis and report generation

type: boolean

Skip clonal lineage analysis and lineage tree plotting.

type: boolean

Skip multiqc report

type: boolean

Options for software packaging

Enable conda to run pipeline with conda environment.

type: boolean

Options for the reference genome indices used to align reads.

Directory / URL base for iGenomes references.

hidden
type: string
default: s3://ngi-igenomes/igenomes

Do not load the iGenomes reference config.

hidden
type: boolean
default: true

Do not load igenomes.config when running the pipeline. You may choose this option if you observe clashes between custom parameters and those supplied in igenomes.config.

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden
type: string
default: master

Base directory for Institutional configs.

hidden
type: string
default: https://raw.githubusercontent.com/nf-core/configs/master

If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.

Institutional config name.

hidden
type: string

Institutional config description.

hidden
type: string

Institutional config contact information.

hidden
type: string

Directory to keep pipeline Nextflow logs and reports.

hidden
type: string
default: ${params.outdir}/pipeline_info

Set the top limit for requested resources for any single job.

Maximum number of CPUs that can be requested for any single job.

hidden
type: integer
default: 16

Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. --max_cpus 1

Maximum amount of memory that can be requested for any single job.

hidden
type: string
default: 128.GB
pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. --max_memory '8.GB'

Maximum amount of time that can be requested for any single job.

hidden
type: string
default: 240.h
pattern: ^(\d+\.?\s*(s|m|h|day)\s*)+$

Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. --max_time '2.h'

Less common options for the pipeline, typically set in a config file.

Display help text.

hidden
type: boolean

Method used to save pipeline results to output directory.

hidden
type: string

The Nextflow publishDir option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.

Email address for completion summary, only when pipeline fails.

hidden
type: string
pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.

Send plain-text email instead of HTML.

hidden
type: boolean

MultiQC report title. Printed as page header, used for filename if not otherwise specified.

type: string

File size limit when attaching MultiQC reports to summary emails.

hidden
type: string
default: 25.MB
pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Do not use coloured log outputs.

hidden
type: boolean

Custom config file to supply to MultiQC.

hidden
type: string

Directory to keep pipeline Nextflow logs and reports.

hidden
type: string
default: ${params.outdir}/pipeline_info

Boolean whether to validate parameters against the schema at runtime

hidden
type: boolean
default: true

Show all params when using --help

hidden
type: boolean

By default, parameters set as hidden in the schema are not shown on the command line when a user runs with --help. Specifying this option will tell the pipeline to show all parameters.

Arguments for this subworkflow

Name of the field used to collapse duplicated sequences

type: string
default: filename,cell_id

Name of the field used to group data files to identify clones

type: string
default: subject_id

Whether to reassign genes if the input file is an AIRR formatted tabulated file

type: boolean
default: true

Subset to productive sequences

type: boolean
default: true

Whether to apply the chimera removal filter

type: boolean
default: true

Use auto to automatically set a threshold to identify clonally related sequences. Set

type: string,number
default: auto

Path to MiAIRR-BioSample mapping

type: string
default: bcellmagic/assets/reveal/mapping_MiAIRR_BioSample_v1.3.1.tsv

Whether input samples include single cell sequencing samples

type: string
default: single_cell