Version history

Changed

  • #77 - Default branch changed from master to main.
  • #73 - Changed the fasta parsing library of the CHUNK_CLUSTERS local module, from pyfastx back to the latest version of biopython, and parallelized its writing mechanism, achieving decreased execution time.

Dependencies

ToolPrevious versionNew version
biopython1.841.85
pyfastx2.2.0

Removed

  • #73 - Deprecated pyfastx module version of CHUNK_CLUSTERS, since it was struggling performance-wise with larger datasets.

Added

  • #69 - Added the hhsuite/reformat nf-core module to reformat .sto alignments to .fas when in-family sequence redundancy is not removed. Also added the option to save intermediate and final family fasta files throughout the workflow with various save parameters.
  • #58 - Added nf-test and meta.yml file for local module REMOVE_REDUNDANCY_SEQS (Hackathon 2025)
  • #56 - Added nf-test and meta.yml file for local module FILTER_RECRUITED (Hackathon 2025)
  • #55 - Added nf-test and meta.yml file for local module CHUNK_CLUSTERS (Hackathon 2025)
  • #54 - Added nf-test for local subworkflow ALIGN_SEQUENCES (Hackathon 2025)
  • #53 - Added nf-test for local subworkflow EXECUTE_CLUSTERING (Hackathon 2025)
  • #51 - Added nf-test and meta.yml file for local module CALCULATE_CLUSTER_DISTRIBUTION (Hackathon 2025)
  • #34 - Added the EXTRACT_UNIQUE_CLUSTER_REPS module, that calculates initial MMseqs clustering metadata, for each sample, to print with MultiQC (Id,Cluster Size,Number of Clusters)

Fixed

  • #69 - Fixed a bug where redundant family alignments were not published properly, if intra-family redundancy removal mechanism was switched off #68
  • #65 - Fixed a bug in CHUNK_CLUSTERS, where pipeline would crash if the module filtered out all clusters, due to a high membership threshold #64
  • #35 - Fixed a bug in remove_redundant_fams.py, where comparison was between strings instead of integers to keep larger family
  • #33 - Fixed an always-true condition at the filter_non_redundant_hmms.py script, by adding missing parentheses
  • #29 - Fixed hmmalign empty input crash error, by preventing the FILTER_RECRUITED module from creating an empty output .fasta.gz file, when there are no remaining sequences after filtering the hmmsearch results #28

Changed

  • #69 - Changed the publish directory architecture for HMMs, seed MSAs, full MSAs and family FASTA files, to make it more intuitive. REMOVE_REDUNDANT_FAMS local module converted to IDENTIFY_REDUNDANT_FAMS to extract redundant family ids which will then be used downstream. FILTER_NON_REDUNDANT_HMMS local module converted to FILTER_NON_REDUNDANT_FAMS and reused four times (HMM, seed MSA, full MSA, FASTA). Changed the output format of the EXTRACT_FAMILY_REPS and REMOVE_REDUNDANT_SEQS local modules from .fa to .faa. Metro map updated with new hhsuite/reformat module.
  • #57 - slight improvements of nextflow_schema.json (Hackathon 2025)
  • #57 - slight improtmenets of assets/schema_input.json (Hackathon 2025)
  • #34 - Swapped the SeqIO python library with pyfastx for the CHUNK_CLUSTERS module, quartering its duration
  • #32 - Updated ClipKIT 2.4.0 -> 2.4.1, that now also allows ends-only trimming, to completely replace the custom CLIP_ENDS module. Users can now also define its output format by setting the --clipkit_out_format parameter (default: clipkit)

Dependencies

ToolPrevious versionNew version
ClipKIT2.4.02.4.1
pyfastx2.2.0
hhsuite3.3.0
multiqc1.271.28

Deprecated

  • #32 - Deprecated CLIP_ENDS module and --clipping_tool parameter. The only option now is ClipKIT, covering both previous modes, via setting --trim_ends_only

Initial release of nf-core/proteinfamilies, created with the nf-core template.

Added

  • Amino acid sequence clustering (mmseqs)
  • Multiple sequence alignment (famsa, mafft, clipkit)
  • Hidden Markov Model generation (hmmer)
  • Between families redundancy removal (hmmer)
  • In-family sequence redundancy removal (mmseqs)
  • Family updating (hmmer, seqkit, mmseqs, famsa, mafft, clipkit)
  • Family statistics presentation (multiqc)

By @vagkaratzas and @mberacochea.