nf-core/multiplesequencealign
Edit

A pipeline to run and systematically evaluate Multiple Sequence Alignment (MSA) methods.

This is the development version of the pipeline.

Launch development version https://github.com/nf-core/multiplesequencealign

Output

The main output of this pipeline are the computed MSAs and summary reports, where all the evaluation metrics and resources used are collected.

Additionally, the pipeline can provide a variety of files useful to inspect (e.g. used guide trees).

The directories listed below will be created in the results directory (specified by --outdir) after the pipeline has finished.

All paths are relative to the top-level results directory.

results/
- alignments/
  MSA computed.
  Each subdirectory is named after the sample id. It contains all the computed alignments for the given sample. The filename is built with the informations of the input file used and the tool(s).
  - {SampleID}/{SampleID}{Tree}_args-{Tree_args}{MSA}_args-{MSA_args}.aln.
- trees/
  
  Rendered guide trees
  
  If you have explicitly enabled the computation of guide trees via the toolsheet to be used by the MSA tool, these guide trees will be generated and stored in the trees directory.
  
  Each subdirectory is named after the sample id and contains all the computed trees for the given sample. The filename is built with the information from the input file used and the tool(s).
  - {SampleID}/{SampleID}_{Tree}_args-{Tree_args}.dnd.
- evaluation/
  
  Computed evaluation statistics.
  - complete_summary_eval.csv: csv file containing the summary of all evaluation metrics for each input file.
  - consensus/: directory containing the files with the informations about the consensus alignment. If --build_consensus is specified.
  - tcoffee_irmsd/: directory containing the files with the complete iRMSD files. If --calc_irmsd is specified.
  - tcoffee_tcs/: directory containing the files with the complete TCS files. If --calc_tcs is specified.
- stats/
  Computed statistics about the input files
  (e.g length of the sequences, number of the sequences, etc.).
  - stats/
    - complete_summary_stats.csv: csv file containing the summary for all the statistics computed on the input file.
    - sequences/
      
      seqstats/*_seqstats.csv: file containing the sequence input length for each sequence in the family defined by the file name. If —calc_seq_stats is specified.
      
      perc_sim/*.txt: file containing the pairwise sequence similarity for all input sequences. If —calc_sim is specified.
    - structures/
      
      plddt/*_full_plddt.csv: file containing the plddt of the structures for each sequence in the input file. If —extract_plddt is specified.
- summary/
  CSV file with the summary of all statistics, evaluation metrics and resources used by each combination of tools
  - complete_summary_stats_with_trace.csv: csv file containing the content of complete_summary_stats merged with the information of the trace file. This will not contain the resources usage running with -resume.
- reports/
  QC and visualization reports.
  - multiqc
    
    MultiQC summary
    
    MultiQC is a visualization tool that generates a single HTML report summarising all samples in your project. Most of the pipeline QC results are visualised in the report and further statistics are available in the report data directory.
    
    Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQC. The pipeline has special steps which also allow the software versions to be reported in the MultiQC output for future traceability. For more information about how to use MultiQC reports, see multiqc.info.
    
    reports/multiqc/ - multiqc_report.html: a standalone HTML file that can be viewed in your web browser. - multiqc_data/: directory containing parsed statistics from the different tools used in the pipeline. - multiqc_plots/: directory containing static images from the report in various formats.
  - visualisation
    Foldmason report for the visualization of the alignment and the protein structures. Only available if structures were provided as input.
    
    reports/visualization/
    
    {SampleID}{Tree}_args-{Tree_args}{MSA}_args-{MSA_args}.html: foldmason HTML report.
  - shiny_app/
    
    A Shiny app is created to explore interactively your results .
    
    A shiny app is prepared to visualize the summary statistics and evaluation of the produced alignments (skip with —skip_shiny).
    
    To run the Shiny app use the following commands from the results directory:
    
    cd shiny_app
    
    ./run.sh
    
    Be aware that you have to have shiny installed to access this feature.
    
    run.sh: executable to start the shiny app.
    
    .py: shiny app files.
    
    *.csv: csv file used by shiny app.
- pipeline_info
  Extra information about the pipeline execution.
  - Reports generated by Nextflow: execution_report.html, execution_timeline.html, execution_trace.txt and pipeline_dag.dot, pipeline_dag.svg.
  - Reports generated by the pipeline: pipeline_report.html, pipeline_report.txt and software_versions.yml. The pipeline_report* files will only be present if the —email, —email_on_fail parameter’s are used when running the pipeline.
  - Reformatted samplesheet files used as input to the pipeline: samplesheet.valid.csv.
  - Parameters used by the pipeline run: params.json. Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.

nf-core/multiplesequencealign Edit

Output

nf-core/multiplesequencealign
Edit