nf-core/fetchngs

Pipeline to fetch metadata and raw FastQ files from public databases

ddbjdownloadenafastqgeosrasynapse

Launch version 1.12.0 https://github.com/nf-core/fetchngs

Version history

Download .zip Download .tar.gz View on GitHub

[1.12.0] - 2024-02-29

:warning: Major enhancements

The Aspera CLI was recently added to Bioconda and we have added it as another way of downloading FastQ files in addition to the existing FTP and sra-tools support. In our limited benchmarks on all public Clouds we found ~50% speed-up in download times compared to FTP! FTP downloads will still be the default download method (i.e. --download_method ftp) but you can choose to use sra-tools or Aspera using --download_method sratools or --download_method aspera, respectively. We would love to have your feedback!
The --force_sratools_download parameter has been deprecated in favour of using --download_method <method> to explicitly specify the download method; available options are ftp, sratools or aspera.
Support for Synapse ids has been dropped in this release. We haven’t had any feedback from users whether it is being used or not. Users can run earlier versions of the pipeline if required.
We have significantly refactored and standardised the way we are using nf-test within this pipeline. This pipeline is now the current, best-practice implementation for nf-test usage on nf-core. We required a number of features to be added to nf-test and a huge shoutout to Lukas Forer for entertaining our requests and implementing them within upstream :heart:!

Credits

Special thanks to the following for their contributions to the release:

Thank you to everyone else that has contributed by reporting bugs, enhancements or in any other way, shape or form.

Enhancements & fixes

PR #238 - Resolved bug when prefetching large studies (#236)
PR #241 - Use wget instead of curl to download files from FTP (#169, #194)
PR #242 - Template update for nf-core/tools v2.11
PR #243 - Fixes for PR #238
PR #245 - Refactor nf-test CI and test and other pre-release fixes (#233)
PR #246 - Handle dark/light mode for logo in GitHub README properly
PR #248 - Update pipeline level test data path to use mirror on s3
PR #249 - Update modules which includes absolute paths for test data, making module level test compatible within the pipeline.
PR #253 - Add implicit tags in nf-test files for simpler testing strategy
PR #257 - Template update for nf-core/tools v2.12
PR #258 - Fixes for PR #253
PR #259 - Add Aspera CLI download support to pipeline (#68)
PR #261 - Revert sratools fasterqdump version (#221)
PR #262 - Use nf-test version v0.8.4 and remove implicit tags
PR #263 - Refine tags used for workflows
PR #264 - Remove synapse workflow from pipeline
PR #265 - Use ”+” syntax for profiles to accumulate profiles in nf-test
PR #266 - Make .gitignore match template
PR #268 - Add mermaid diagram
PR #273 - Update utility subworkflows
PR #283 - Template update for nf-core/tools v2.13
PR #288 - Update Github Action to run full-sized test for all 3 download methods
PR #290 - Remove mentions of deprecated Synapse functionality in pipeline
PR #294 - Replace mermaid diagram with subway map
PR #295 - Be less stringent with test expectations for CI
PR #296 - Remove params.outdir from tests where required and update snapshots
PR #298 - export CONDA_PREFIX into container when using Singularity and Apptainer

Software dependencies

Dependency	Old version	New version
`wget`		1.20.1

NB: Dependency has been updated if both old and new version information is present.

NB: Dependency has been added if just the new version information is present.

NB: Dependency has been removed if new version information isn’t present.

Parameters

Old parameter	New parameter
	`--download_method`
`--input_type`
`--force_sratools_download`
`--synapse_config`

NB: Parameter has been updated if both old and new parameter information is present. NB: Parameter has been added if just the new parameter information is present. NB: Parameter has been removed if new parameter information isn’t present.

Download .zip Download .tar.gz View on GitHub

What’s Changed

remove public_aws_ecr by @maxulysse in https://github.com/nf-core/fetchngs/pull/185
Fix tests by @maxulysse in https://github.com/nf-core/fetchngs/pull/187
Adds emit statement for FASTQs and metadata to SRA workflow by @adamrtalbot in https://github.com/nf-core/fetchngs/pull/184
split up config files to be more modular by @maxulysse in https://github.com/nf-core/fetchngs/pull/186
Move out multiQC and versions by @maxulysse in https://github.com/nf-core/fetchngs/pull/189
tiny refactor by @maxulysse in https://github.com/nf-core/fetchngs/pull/190
Update SRA workflow tests by @maxulysse in https://github.com/nf-core/fetchngs/pull/191
update tests by @maxulysse in https://github.com/nf-core/fetchngs/pull/192
FEAT: add changes by @maxulysse in https://github.com/nf-core/fetchngs/pull/193
Recursively inherit configs by @adamrtalbot in https://github.com/nf-core/fetchngs/pull/195
remove all the nf-test logic from the refactor branch by @maxulysse in https://github.com/nf-core/fetchngs/pull/198
restore nf-test tests by @maxulysse in https://github.com/nf-core/fetchngs/pull/200
forgot this file by @maxulysse in https://github.com/nf-core/fetchngs/pull/202
fix path to file to include and update snapshots by @maxulysse in https://github.com/nf-core/fetchngs/pull/203
Trying out initialise by @maxulysse in https://github.com/nf-core/fetchngs/pull/204
Bump pipeline version to 1.11.0dev by @drpatelh in https://github.com/nf-core/fetchngs/pull/211
nf-test POC by @maxulysse in https://github.com/nf-core/fetchngs/pull/201
Per module/subworkflow tags.yml file by @adamrtalbot in https://github.com/nf-core/fetchngs/pull/212
Remove lib directory and replace with atomic subworkflows by @drpatelh in https://github.com/nf-core/fetchngs/pull/213
update modules and tests + fix linting by @maxulysse in https://github.com/nf-core/fetchngs/pull/214
FIX: custom/dumpsoftwareversions by @maxulysse in https://github.com/nf-core/fetchngs/pull/215
add pipeline level tests by @maxulysse in https://github.com/nf-core/fetchngs/pull/216
Tag and path updates for nf-test files by @drpatelh in https://github.com/nf-core/fetchngs/pull/217
Update workflows tests by @maxulysse in https://github.com/nf-core/fetchngs/pull/218
Update modules + tests by @maxulysse in https://github.com/nf-core/fetchngs/pull/219
Use nf-core nfvalidation subworkflow by @adamrtalbot in https://github.com/nf-core/fetchngs/pull/222
Update nextflowpipelineutils by @adamrtalbot in https://github.com/nf-core/fetchngs/pull/224
Use nf-core subworkflow: NFCORE_PIPELINE_UTILS by @adamrtalbot in https://github.com/nf-core/fetchngs/pull/223
Fix all by @maxulysse in https://github.com/nf-core/fetchngs/pull/225
fix sratools by @maxulysse in https://github.com/nf-core/fetchngs/pull/227
Replace CUSTOM_DUMPSOFTWAREVERSIONS with collectFile operator by @adamrtalbot in https://github.com/nf-core/fetchngs/pull/226
fix sratools and fewer ids by @maxulysse in https://github.com/nf-core/fetchngs/pull/228
Prepare 1.11.0 RC by @maxulysse in https://github.com/nf-core/fetchngs/pull/230
Refactor POC by @maxulysse in https://github.com/nf-core/fetchngs/pull/188
Release candidate 1.11.0 by @maxulysse in https://github.com/nf-core/fetchngs/pull/231

Full Changelog: https://github.com/nf-core/fetchngs/compare/1.10.1…1.11.0

Download .zip Download .tar.gz View on GitHub

[1.10.1] - 2023-10-08

Credits

Special thanks to the following for their contributions to the release:

Thank you to everyone else that has contributed by reporting bugs, enhancements or in any other way, shape or form.

Enhancements & fixes

#173 - Add compatibility for sralite files
PR #205 - Rename all local modules, workflows and remove public_aws_ecr profile
PR #206 - CI improvments and code cleanup
PR #208 - Template update with nf-core/tools 2.10

Software dependencies

Dependency	Old version	New version
`sra-tools`	2.11.0	3.0.8

NB: Dependency has been updated if both old and new version information is present.

NB: Dependency has been added if just the new version information is present.

NB: Dependency has been removed if new version information isn’t present.

Download .zip Download .tar.gz View on GitHub

[1.10.0] - 2023-05-16

Credits

Special thanks to the following for their contributions to the release:

Thank you to everyone else that has contributed by reporting bugs, enhancements or in any other way, shape or form.

Enhancements & fixes

#85 - Not able to fetch metadata for ERR ids associated with ArrayExpress
#104 - Add support back in for GEO IDs (removed in v1.7)
#129 - Pipeline is working with SRA run ids but failing with corresponding Biosample ids
#138 - Add support for downloading protected dbGAP data using a JWT file
#144 - Add support to download 10X Genomics data
PR #140 - Bumped modules version to allow for sratools download of sralite format files
PR #147 - Updated pipeline template to nf-core/tools 2.8
PR #148 - Fix default metadata fields for ENA API v2.0
PR #150 - Add infrastructure and CI for multi-cloud full-sized tests run via Nextflow Tower
PR #157 - Add public_aws_ecr.config to source mulled containers when using public.ecr.aws Docker Biocontainer registry

Software dependencies

Dependency	Old version	New version
`synapseclient`	2.6.0	2.7.1

NB: Dependency has been updated if both old and new version information is present.

NB: Dependency has been added if just the new version information is present.

NB: Dependency has been removed if new version information isn’t present.

Download .zip Download .tar.gz View on GitHub

[1.9] - 2022-12-21

Enhancements & fixes

Bumped minimum Nextflow version from 21.10.3 -> 22.10.1
Updated pipeline template to nf-core/tools 2.7.2
Added support for generating nf-core/atacseq compatible samplesheets
Added --nf_core_rnaseq_strandedness parameter to specify value for strandedness entry added to samplesheet created when using --nf_core_pipeline rnaseq. The default is auto which can be used with nf-core/rnaseq v3.10 onwards to auto-detect strandedness during the pipeline execution.

Download .zip Download .tar.gz View on GitHub

[1.8] - 2022-11-08

Enhancements & fixes

#111 - Change input mimetype to csv
#114 - Final samplesheet is not created when --skip_fastq_download is provided
#118 - Allow input pattern validation for csv/tsv/txt
#119 - --force_sratools_download results in different fastq names compared to FTP download
#121 - Add tower.yml to render samplesheet as Report in Tower
Fetch SRR and DRR metadata from ENA API instead of NCBI API to bypass frequent breaking changes
Updated pipeline template to nf-core/tools 2.6

Download .zip Download .tar.gz View on GitHub

[1.7] - 2022-07-01

:warning: Major enhancements

Support for GEO ids has been dropped in this release due to breaking changes introduced in the NCBI API. For more detailed information please see this PR.

As a workaround, if you have a GEO accession you can directly download a text file containing the appropriate SRA ids to pass to the pipeline:

Search for your GEO accession on GEO
Click SRA Run Selector at the bottom of the GEO accession page
Select the desired samples in the SRA Run Selector and then download the Accession List

This downloads a text file called SRR_Acc_List.txt that can be directly provided to the pipeline e.g. --input SRR_Acc_List.txt.

Enhancements & fixes

#97 - Add support for generating nf-core/taxprofiler compatible samplesheets.
#99 - SRA_IDS_TO_RUNINFO fails due to bad request
Add enum field for --nf_core_pipeline to parameter schema so only accept supported pipelines are accepted

Download .zip Download .tar.gz View on GitHub

[1.6] - 2022-05-17

#57 - fetchngs fails if FTP is blocked
#89 - Improve detection and usage of the NCBI user settings by using the standardized sra-tools modules from nf-core.
#93 - Adjust modules configuration to respect the publish_dir_mode parameter.
[nf-core/rnaseq#764] - Test fails when using GCP due to missing tools in the basic biocontainer
Updated pipeline template to nf-core/tools 2.4.1

Software dependencies

Dependency	Old version	New version
`synapseclient`	2.4.0	2.6.0

Download .zip Download .tar.gz View on GitHub

[1.5] - 2021-12-01

Finish porting the pipeline to the updated Nextflow DSL2 syntax adopted on nf-core/modules
- Bump minimum Nextflow version from 21.04.0 -> 21.10.3
- Removed --publish_dir_mode as it is no longer required for the new syntax

Download .zip Download .tar.gz View on GitHub

[1.4] - 2021-11-09

Enhancements & fixes

Convert pipeline to updated Nextflow DSL2 syntax for future adoption across nf-core
Added a workflow to download FastQ files and to create samplesheets for ids from the Synapse platform hosted by Sage Bionetworks.
SRA identifiers not available for direct download via the ENA FTP will now be downloaded via sra-tools.
Added --force_sratools_download parameter to preferentially download all FastQ files via sra-tools instead of ENA FTP.
Correctly handle errors from SRA identifiers that do not return metadata, for example, due to being private.
Retry an error in prefetch via bash script in order to allow it to resume interrupted downloads.
Name output FastQ files by {EXP_ACC}_{RUN_ACC}*fastq.gz instead of {EXP_ACC}_{T*}*fastq.gz for run id provenance
[#46] - Bug in sra_ids_to_runinfo.py
Added support for DDBJ ids. See examples below:

`DDBJ`
PRJDB4176
SAMD00114846
DRA008156
DRP004793
DRR171822
DRS090921
DRX162434

Download .zip Download .tar.gz View on GitHub

[1.3] - 2021-09-15

Enhancements & fixes

Replaced Python requests with urllib to fetch ENA metadata

Software dependencies

Dependency	Old version	New version
`python`	3.8.3	3.9.5

Download .zip Download .tar.gz View on GitHub

[1.2] - 2021-07-28

Enhancements & fixes

Updated pipeline template to nf-core/tools 2.1
[#26] - Update broken EBI API URL

Download .zip Download .tar.gz View on GitHub

[1.1] - 2021-06-22

Enhancements & fixes

[#12] - Error when using singularity - /etc/resolv.conf doesn’t exist in container
Added --sample_mapping_fields parameter to create a separate id_mappings.csv and multiqc_config.yml with selected fields that can be used to rename samples in general and in MultiQC

Download .zip Download .tar.gz View on GitHub

[1.0] - 2021-06-08

Initial release of nf-core/fetchngs, created with the nf-core template.

Pipeline summary

Via a single file of ids, provided one-per-line the pipeline performs the following steps:

Resolve database ids back to appropriate experiment-level ids and to be compatible with the ENA API
Fetch extensive id metadata including direct download links to FastQ files via ENA API
Download FastQ files in parallel via curl and perform md5sum check
Collate id metadata and paths to FastQ files in a single samplesheet

Supported database ids

Currently, the following types of example identifiers are supported:

`SRA`	`ENA`	`GEO`
SRR11605097	ERR4007730	GSM4432381
SRX8171613	ERX4009132	GSE147507
SRS6531847	ERS4399630
SAMN14689442	SAMEA6638373
SRP256957	ERP120836
SRA1068758	ERA2420837
PRJNA625551	PRJEB37513