nf-core/cageseq

CAGE-sequencing analysis pipeline with trimming, alignment and counting of CAGE tags.

cagecage-seqcageseq-datagene-expressionrna

This is the development version of the pipeline.

This pipeline uses DSL1. It will not work with Nextflow versions after 22.10.6. Learn more.

Launch development version https://github.com/nf-core/cageseq

Define where the pipeline should find input data and save output data.

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

required

type: string

Email address for completion summary.

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

MultiQC report title. Printed as page header, used for filename if not otherwise specified.

type: string

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden

type: string

default: master

Base directory for Institutional configs.

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/configs/master

Institutional config name.

hidden

type: string

Institutional config description.

hidden

type: string

Institutional config contact information.

hidden

type: string

Institutional config URL link.

hidden

type: string

Less common options for the pipeline, typically set in a config file.

Display version and exit.

hidden

type: boolean

Method used to save pipeline results to output directory.

hidden

type: string

Email address for completion summary, only when pipeline fails.

hidden

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Send plain-text email instead of HTML.

hidden

type: boolean

File size limit when attaching MultiQC reports to summary emails.

hidden

type: string

default: 25.MB

pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Do not use coloured log outputs.

hidden

type: boolean

Incoming hook URL for messaging service

hidden

type: string

Custom config file to supply to MultiQC.

hidden

type: string

Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file

hidden

type: string

Custom MultiQC yaml file containing HTML including a methods description.

type: string

Boolean whether to validate parameters against the schema at runtime

hidden

type: boolean

default: true

Base URL or local path to location of pipeline test dataset files

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/test-datasets/

Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.

hidden

type: string

Run the whole pipeline

type: boolean

default: true

Run only the mapping part until bigiwgs or bams

type: boolean

Run only the CAGEr processing part from bigiwgs or bams

type: boolean

Genome annotation fiel in GTF format

required

type: string

Path to the input file. Mutually exclusive with infolder

type: string

Path to the folder with fastq files. Mutually exclusive with input

type: string

Whether to save merged fasta files

type: boolean

default: true

Number of underscore separated fields denoting sample name when infolder is used

type: integer

Name of the reference genome. It is used as meta information

type: string

FASTA file containing a reference genome.

type: string

Specifies a directory with a genome index

type: string

Sequencing platform used. Required for mapping with STAR

type: string

Name of the sequencing center. Required for mapping with STAR

type: boolean

Whether only uniquely mapped reads should be considered for downstream analysis.

type: boolean

default: true

Whether to keep only those reads that start with G base

type: boolean

Additional parameters that can be passed to TrimGalore!

type: string

Makes the pipeline skip the G-trimming step in preprocessing

type: boolean

Switches the aligner from STAR to bowtie2

type: boolean

Switches on PCR duplicate removal

type: boolean

Sets an optical duplicate distance, used together with dedup

type: integer

The input CSV samplesheet including the name of the samples, their pairedness status, and the location of bigwig or bam files. Required when cageronly is true.

type: string

Format of the mapping data file passed to the TSS analysis part when STAR is used (either ‘bam’ or ‘bigwig’).

type: string

default: bigwig

Seed file for BSgenome forging

type: string

Directory containing either a set of FASTA files, one per reference chromosome, or a 2bit file for the whole reference genome. Used for BSgenome forging

type: string

BSgenome R package to use (if not forged)

type: string

Threshold above which raw and normalized CTSS are considered for the correlation plot

type: integer

default: 1

Defines the lower thresold for fitting the power-law distribution

type: integer

default: 5

Defines the upper thresold for fitting the power-law distribution

type: integer

default: 10000

Method used for normalizing the samples: powerLaw, simpleTpm, and none are supported

type: string

default: powerLaw

User specified alpha, the -1 * fitted slope in the log-log representation of the power-law distribution. If none, the average across samples is calculated and used.

type: string

Total number of CAGE tags in the reference power-law distribution

type: integer

default: 1000000

Parameters for filtering low expressed CTSS before clustering. ctss_thr specifies the lower threshold above which CTSS are considered, and sample_num_thr specifies the number of samples where this threshold should be passed.

type: integer

default: 1

type: integer

default: 1

Maximum distance for distance-based clustering (distclu)

type: integer

default: 20

The tpm threshold above which even a single CTSS is kept during clustering

type: integer

default: 5

Define the lower quantile boundaries of the interquartile range

type: number

default: 0.1

Define the upper quantile boundaries of the interquartile range

type: number

default: 0.9

Threshold above which tag clusters are considered for the interquartile width distribution plot

type: integer

default: 3

Upstream distance to consider into TSS region for ChIPseeker annotation

type: integer

default: -3000

Downstream distance to consider into TSS region for ChIPseeker annotation

type: integer

default: 3000

The number of bases to inlcude upstream of the TSS for TSS logos

type: integer

default: 35

Used for defining the consensus clusters. consensus_thr specifies the TPM threshold above which tag clusters are considered for consensus clusters, and consensus_dist define the maximum distance between the interquartile ranges of tag clusters to be joined together into consensus clusters.

type: integer

default: 2

type: integer

default: 100

Defines the balance threshold above which bidirectionality is considered balanced and enhancers are called

type: number

default: 0.95

Used for selecting only supported enhancers. unexpressed is a non inclusive lower TPM boundary for expression when calculating support of enhancers. minSamples is a non-inclusive lower boundary for the number of samples where the clusters should show bidirectionality.

type: integer

On this page

nf-core/cageseq

Input/output options

Institutional config options

Generic options

Pipeline parameters