nf-core/pgdb

The ProteoGenomics database generation workflow creates different protein databases for ProteoGenomics data analysis.

cosmicgnomadprotein-databasesproteogenomicsproteomicspypgatk

This is the development version of the pipeline.

This pipeline uses DSL1. It will not work with Nextflow versions after 22.10.6. Learn more.

Launch development version https://github.com/nf-core/pgdb

Add ENSEMBL canonical proteomes

Add the reference proteome to the file

type: boolean

default: true

Path to configuration file for ENSEMBL download parameters

type: string

Path to configuration file for parameters in generating protein databases from ENSMEBL sequences

type: string

URL for downloading GENCODE datafiles

type: string

default: ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19

Taxonomic term for the species to download from ENSEMBL

type: string

default: homo_sapiens

Non canonical proteins generation parameters

Generate protein database from non-coding RNA

type: boolean

Generate protein database from pseudogenes

type: boolean

Generate alternative ORFs from canonical proteins

type: boolean

Download ENSEMBL variants and generate protein database

type: boolean

Proteins generated using an input VCF

Enable translation of a given VCF file

type: boolean

VCF file path to be translated. Generate variants proteins by modifying sequences of affected transcripts.

type: string

Allele frequency identifier string in VCF Info column, if no AF info is given set it to empty.

type: string

cBioportal variant parameters

Download cBioPortal studies and generate protein database

type: boolean

Specify a tissue type to limit the cBioPortal mutations to a particular caner type

type: string

default: all

Specify a column from the clinical sample file to be used for filtering records

type: string

default: CANCER_TYPE

Download mutations from a specific study in cbiportal default is all which downloads mutations from all studies

type: string

cBioPortal configuration file

type: string

COSMIC variant proteins parameters

Download COSMIC mutation files and generate protein database

type: boolean

Download COSMIC cell line files and generate protein database

type: string

User name (or email) for COSMIC account

type: string

Password for COSMIC account

type: string

Path to configuration file for parameters in generating

type: string

Specify a tissue type to limit the COSMIC mutations to a particular caner type

type: string

default: all

Specify a sample name to limit the COSMIC cell line mutations to a particular cell line

type: string

default: all

Add gNOMAD variants to the database

type: boolean

gNOMAD url

type: string

default: gs://gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.vcf.bgz

Generate decoy proteins and attach them to the final protein database

Append the decoy proteins to the database

type: boolean

String to be used as prefix for the generated decoy sequences

type: string

default: Decoy_

Method used to generate the decoy database

type: string

Enzyme used to generate the decoy

type: string

default: Trypsin

Configuration file to perform the decoy generation

type: string

Clean and process the resulted database

Clean the database for stop codons, short protein sequences

type: boolean

Minimum number of AminoAcids for a protein to be included in the database

type: integer

default: 6

If an stop codons is found, create two proteins from it

type: boolean

Define where the pipeline should find input data and save output data.

The output directory where the results will be saved.

type: string

default: ./results

Email address for completion summary.

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Filename for the final protein database

type: string

default: final_proteinDB.fa

Less common options for the pipeline, typically set in a config file.

Display help text.

hidden

type: boolean

Method used to save pipeline results to output directory.

hidden

type: string

Boolean whether to validate parameters against the schema at runtime

hidden

type: boolean

default: true

Email address for completion summary, only when pipeline fails.

hidden

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Send plain-text email instead of HTML.

hidden

type: boolean

File size limit when attaching MultiQC reports to summary emails.

hidden

type: string

default: 25.MB

Do not use coloured log outputs.

hidden

type: boolean

Directory to keep pipeline Nextflow logs and reports.

hidden

type: string

default: ${params.outdir}/pipeline_info

Show all params when using --help

hidden

type: boolean

Set the top limit for requested resources for any single job.

Maximum number of CPUs that can be requested for any single job.

hidden

type: integer

default: 16

Maximum amount of memory that can be requested for any single job.

hidden

type: string

default: 128.GB

pattern: ^[\d\.]+\s*.(K|M|G|T)?B$

Maximum amount of time that can be requested for any single job.

hidden

type: string

default: 240.h

pattern: ^(\d+(\.\d+)?(?:\s*|\.?)(s|m|h|d)\s*)+$

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden

type: string

default: master

Base directory for Institutional configs.

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/configs/master

Institutional configs hostname.

hidden

type: string

Institutional config name.

hidden

type: string

Institutional config description.

hidden

type: string

Institutional config contact information.

hidden

type: string

Institutional config URL link.

hidden

type: string

On this page