nf-core/proteinfold
Edit

Protein 3D structure prediction pipeline

alphafold2colabfoldesmfoldprotein-fold-predictionprotein-foldingprotein-sequencesprotein-structure

Launch version 2.0.0 https://github.com/nf-core/proteinfold

RoseTTAFold-All-Atom

Mode	Protein	RNA	Small-molecule	PTM	Constraints	pLM	MSA server	Split MSA
RoseTTAFold-All-Atom	✅	✅	✅	✅	❌	❌	❌	❌

RoseTTAFold All-Atom can be run using the command below:

nextflow run nf-core/proteinfold \
      --input samplesheet.csv \
      --outdir <OUTDIR> \
      --mode rosettafold_all_atom \
      --rosettafold_all_atom_db <null (default) | DB_PATH> \
      --use_gpu \
      -profile <docker/singularity/.../institute>

File Structure

The file structure of --rosettafold_all_atom_db must be as follows:

Directory structure

<rosettafold_all_atom_db>/
├── bfd
│  ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffdata
│  ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffindex
│  ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffdata
│  ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffindex
│  ├── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffdata
│  └── bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffindex
├── params
│   └── RFAA_paper_weights.pt
├── pdb100
│   ├── LICENSE
│   ├── pdb100_2021Mar03_a3m.ffdata
│   ├── pdb100_2021Mar03_a3m.ffindex
│   ├── pdb100_2021Mar03_cs219.ffdata
│   ├── pdb100_2021Mar03_cs219.ffindex
│   ├── pdb100_2021Mar03_hhm.ffdata
│   ├── pdb100_2021Mar03_hhm.ffindex
│   ├── pdb100_2021Mar03_pdb.ffdata
│   └── pdb100_2021Mar03_pdb.ffindex
└── uniref30
    ├── UniRef30_2023_02_a3m.ffdata
    ├── UniRef30_2023_02_a3m.ffindex
    ├── UniRef30_2023_02_cs219.ffdata
    ├── UniRef30_2023_02_cs219.ffindex
    ├── UniRef30_2023_02_hhm.ffdata
    ├── UniRef30_2023_02_hhm.ffindex
    └── UniRef30_2023_02.md5sums

If individual components are available at different locations in the filesystem, they can be set using the following flags:

--rosettafold_all_atom_bfd_path </PATH/TO/bfd/*>
--rosettafold_all_atom_paper_weights_path </PATH/TO/params/RFAA_paper_weights.pt>
--rosettafold_all_atom_uniref30_path </PATH/TO/uniref30/*>
--rosettafold_all_atom_pdb100_path </PATH/TO/pdb100/*>

Without setting the --rosettafold_all_atom_db flag, all of the required data files will be downloaded during the workflow execution.

Warning

The RoseTTAFold-All-Atom reference databases require ~2TB of disk space.

YAML format

RoseTTAFold-All-Atom allows modelling nucleic acids and small molecule ligands as well as specifying post-translational modifications. However, this input information is not supported in the FASTA format and must be specified in an input YAML file according to the RoseTTAFold-All-Atom specification.

RoseTTAFold-All-Atom YAML files can be run with proteinfold in rosettafold_all_atom mode by substituting the typical FASTA file in the input samplesheet.

id,fasta
T1024,T1024.yaml

Note

Structures predicted from the RoseTTAFold-All-Atom YAML input will not be compatible with running multiple modes simultaneously.

On this page

nf-core/proteinfoldEdit

RoseTTAFold-All-Atom

File Structure

YAML format

nf-core/proteinfold
Edit