nf_core.pipelines.download

Downloads a nf-core pipeline to the local file system.

exceptionnf_core.pipelines.download.ContainerError(container, registry, address, absolute_URI, out_path, singularity_command, error_msg)

Bases: Exception

A class of errors related to pulling containers with Singularity/Apptainer

exceptionImageExistsError(error_log)

Bases: FileExistsError

Image already exists in cache/output directory.

exceptionImageNotFoundError(error_log)

Bases: FileNotFoundError

The image can not be found in the registry

exceptionInvalidTagError(error_log)

Bases: AttributeError

Image and registry are valid, but the (version) tag is not

exceptionNoSingularityContainerError(error_log)

Bases: RuntimeError

The container image is no native Singularity Image Format.

exceptionOtherError(error_log)

Bases: RuntimeError

Undefined error with the container

exceptionRegistryNotFoundError(error_log)

Bases: ConnectionRefusedError

The specified registry does not resolve to a valid IP address

exceptionnf_core.pipelines.download.DownloadError

Bases: RuntimeError

A custom exception that is raised when nf-core pipelines download encounters a problem that we already took into consideration. In this case, we do not want to print the traceback, but give the user some concise, helpful feedback instead.

classnf_core.pipelines.download.DownloadProgress(*columns: str | ProgressColumn, console: Console | None = None, auto_refresh: bool = True, refresh_per_second: float = 10, speed_estimate_period: float = 30.0, transient: bool = False, redirect_stdout: bool = True, redirect_stderr: bool = True, get_time: Callable[[], float] | None = None, disable: bool = False, expand: bool = False)

Bases: Progress

Custom Progress bar class, allowing us to have two progress bars with different columns / layouts.

get_renderables()

Get a number of renderables for the progress display.

classnf_core.pipelines.download.DownloadWorkflow(pipeline=None, revision=None, outdir=None, compress_type=None, force=False, platform=False, download_configuration=None, additional_tags=None, container_system=None, container_library=None, container_cache_utilisation=None, container_cache_index=None, parallel_downloads=4)

Bases: object

Downloads a nf-core workflow from GitHub to the local file system.

Can also download its Singularity container image if required.

  • Parameters:
    • pipeline (str) – A nf-core pipeline name.
    • revision (List *[*str ]) – The workflow revision(s) to download, like 1.0 or dev . Defaults to None.
    • outdir (str) – Path to the local download directory. Defaults to None.
    • compress_type (str) – Type of compression for the downloaded files. Defaults to None.
    • force (bool) – Flag to force download even if files already exist (overwrite existing files). Defaults to False.
    • platform (bool) – Flag to customize the download for Seqera Platform (convert to git bare repo). Defaults to False.
    • download_configuration (str) – Download the configuration files from nf-core/configs. Defaults to None.
    • tag (List *[*str ]) – Specify additional tags to add to the downloaded pipeline. Defaults to None.
    • container_system (str) – The container system to use (e.g., “singularity”). Defaults to None.
    • container_library (List *[*str ]) – The container libraries (registries) to use. Defaults to None.
    • container_cache_utilisation (str) – If a local or remote cache of already existing container images should be considered. Defaults to None.
    • container_cache_index (str) – An index for the remote container cache. Defaults to None.
    • parallel_downloads (int) – The number of parallel downloads to use. Defaults to 4.

compress_download() → None

Take the downloaded files and make a compressed .tar.gz archive.

download_configs()

Downloads the centralised config profiles from nf-core/configs to self.outdir.

download_wf_files(revision, wf_sha, download_url)

Downloads workflow files from GitHub to the self.outdir.

download_workflow()

Starts a nf-core workflow download.

download_workflow_platform(location=None)

Create a bare-cloned git repository of the workflow, so it can be launched with tw launch as file:/ pipeline

download_workflow_static()

Downloads a nf-core workflow from GitHub to the local file system in a self-contained manner.

find_container_images(workflow_directory: str) → None

Find container image names for workflow.

Starts by using nextflow config to pull out any process.container declarations. This works for DSL1. It should return a simple string with resolved logic, but not always, e.g. not for differentialabundance 1.2.0

Second, we look for DSL2 containers. These can’t be found with nextflow config at the time of writing, so we scrape the pipeline files. This returns raw matches that will likely need to be cleaned.

gather_registries(workflow_directory: str) → None

Fetch the registries from the pipeline config and CLI arguments and store them in a set. This is needed to symlink downloaded container images so Nextflow will find them.

get_revision_hash()

Find specified revision / branch hash

get_singularity_images(current_revision: str = '') → None

Loop through container names and download Singularity images

prioritize_direct_download(container_list: List[str]) → List[str]

Helper function that takes a list of container images (URLs and Docker URIs), eliminates all Docker URIs for which also a URL is contained and returns the cleaned and also deduplicated list.

Conceptually, this works like so:

Everything after the last Slash should be identical, e.g. “scanpy:1.7.2–pyhdfd78af_0” in [‘https://depot.galaxyproject.org/singularity/scanpy:1.7.2–pyhdfd78af\_0’, ‘biocontainers/scanpy:1.7.2–pyhdfd78af_0’]

re.sub(‘.*/(.*)’,’1’,c) will drop everything up to the last slash from c (container_id)

d.get(k:=re.sub(‘.*/(.*)’,’1’,c),’’) assigns the truncated string to k (key) and gets the corresponding value from the dict if present or else defaults to “”.

If the regex pattern matches, the original container_id will be assigned to the dict with the k key. r”^$|(?!^http)” matches an empty string (we didn’t have it in the dict yet and want to keep it in either case) or any string that does not start with http. Because if our current dict value already starts with http, we want to keep it and not replace with with whatever we have now (which might be the Docker URI).

A regex that matches http, r”^$|^http” could thus be used to prioritize the Docker URIs over http Downloads

We also need to handle a special case: The https:// Singularity downloads from Seqera Containers all end in ‘data’, although they are not equivalent, e.g.:

https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/63/6397750e9730a3fbcc5b4c43f14bd141c64c723fd7dad80e47921a68a7c3cd21/data’ ‘https://community-cr-prod.seqera.io/docker/registry/v2/blobs/sha256/c2/c262fc09eca59edb5a724080eeceb00fb06396f510aefb229c2d2c6897e63975/data

Lastly, we want to remove at least a few Docker URIs for those modules, that have an oras:// download link.

prompt_compression_type()

Ask user if we should compress the downloaded files

prompt_config_inclusion()

Prompt for inclusion of institutional configurations

prompt_container_download()

Prompt whether to download container images or not

prompt_pipeline_name()

Prompt for the pipeline name if not set with a flag

prompt_revision() → None

Prompt for pipeline revision / branch Prompt user for revision tag if ‘–revision’ was not set If –platform is specified, allow to select multiple revisions Also the static download allows for multiple revisions, but we do not prompt this option interactively.

prompt_singularity_cachedir_creation()

Prompt about using $NXF_SINGULARITY_CACHEDIR if not already set

prompt_singularity_cachedir_remote()

Prompt about the index of a remote $NXF_SINGULARITY_CACHEDIR

prompt_singularity_cachedir_utilization()

Ask if we should only use $NXF_SINGULARITY_CACHEDIR without copying into target

read_remote_containers()

Reads the file specified as index for the remote Singularity cache dir

staticreconcile_seqera_container_uris(prioritized_container_list: List[str], other_list: List[str]) → List[str]

Helper function that takes a list of Seqera container URIs, extracts the software string and builds a regex from them to filter out similar containers from the second container list.

prioritzed_container_list = [ … “oras://community.wave.seqera.io/library/multiqc:1.25.1–f0e743d16869c0bf”, … “oras://community.wave.seqera.io/library/multiqc_pip_multiqc-plugins

” … ]

will be cleaned to

[‘library/multiqc:1.25.1’, ‘library/multiqc_pip_multiqc-plugins’]

Subsequently, build a regex from those and filter out matching duplicates in other_list:

rectify_raw_container_matches(raw_findings)

Helper function to rectify the raw extracted container matches into fully qualified container names. If multiple containers are found, any prefixed with http for direct download is prioritized

Example syntax:

Early DSL2:

if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) {
    container "https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0"
} else {
    container "quay.io/biocontainers/fastqc:0.11.9--0"
}

Later DSL2:

container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
    'https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0' :
    'biocontainers/fastqc:0.11.9--0' }"

Later DSL2, variable is being used:

container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
    "https://depot.galaxyproject.org/singularity/${container_id}" :
    "quay.io/biocontainers/${container_id}" }"
 
container_id = 'mulled-v2-1fa26d1ce03c295fe2fdcf85831a92fbcbd7e8c2:afaaa4c6f5b308b4b6aa2dd8e99e1466b2a6b0cd-0'

DSL1 / Special case DSL2:

container "nfcore/cellranger:6.0.2"

singularity_copy_cache_image(container: str, out_path: str, cache_path: str | None) → None

Copy Singularity image from NXF_SINGULARITY_CACHEDIR to target folder.

singularity_download_image(container: str, out_path: str, cache_path: str | None, progress:DownloadProgress) → None

Download a singularity image from the web.

Use native Python to download the file.

  • Parameters:
    • container (str) – A pipeline’s container name. Usually it is of similar format to https://depot.galaxyproject.org/singularity/name:version
    • out_path (str) – The final target output path
    • cache_path (str , None) – The NXF_SINGULARITY_CACHEDIR path if set, None if not
    • progress (Progress) – Rich progress bar instance to add tasks to.

singularity_image_filenames(container: str) → Tuple[str, str | None]

Check Singularity cache for image, copy to destination folder if found.

  • Parameters: container (str) – A pipeline’s container name. Can be direct download URL or a Docker Hub repository ID.
  • Returns: Returns a tuple of (out_path, cache_path). : out_path is the final target output path. it may point to the NXF_SINGULARITY_CACHEDIR, if cache utilisation was set to ‘amend’. If cache utilisation was set to ‘copy’, it will point to the target folder, a subdirectory of the output directory. In the latter case, cache_path may either be None (image is not yet cached locally) or point to the image in the NXF_SINGULARITY_CACHEDIR, so it will not be downloaded from the web again, but directly copied from there. See get_singularity_images() for implementation.
  • Return type: tuple (str, str)

singularity_pull_image(container: str, out_path: str, cache_path: str | None, library: List[str], progress:DownloadProgress) → None

Pull a singularity image using singularity pull

Attempt to use a local installation of singularity to pull the image.

  • Parameters:
    • container (str) – A pipeline’s container name. Usually it is of similar format to nfcore/name:version.
    • library (list of str) – A list of libraries to try for pulling the image.
  • Raises: Various exceptions possible from subprocess execution of Singularity.

Create a symlink for each registry in the registry set that points to the image. We have dropped the explicit registries from the modules in favor of the configurable registries. Unfortunately, Nextflow still expects the registry to be part of the file name, so a symlink is needed.

The base image, e.g. ./nf-core-gatk-4.4.0.0.img will thus be symlinked as for example ./quay.io-nf-core-gatk-4.4.0.0.img by prepending all registries in self.registry_set to the image name.

Unfortunately, out output image name may contain a registry definition (Singularity image pulled from depot.galaxyproject.org or older pipeline version, where the docker registry was part of the image name in the modules). Hence, it must be stripped before to ensure that it is really the base name.

wf_use_local_configs(revision_dirname)

Edit the downloaded nextflow.config file to use the local config files

classnf_core.pipelines.download.WorkflowRepo(remote_url, revision, commit, additional_tags, location=None, hide_progress=False, in_cache=True)

Bases: SyncedRepo

An object to store details about a locally cached workflow repository.

Important Attributes: : fullname: The full name of the repository, nf-core/{self.pipelinename}. local_repo_dir (str): The local directory, where the workflow is cloned into. Defaults to $HOME/.cache/nf-core/nf-core/{self.pipeline}.

__add_additional_tags() → None

access()

bare_clone(destination)

checkout(commit)

Checks out the repository at the requested commit

  • Parameters: commit (str) – Git SHA of the commit

get_remote_branches(remote_url)

Get all branches from a remote repository

  • Parameters: remote_url (str) – The git url to the remote repository
  • Returns: All branches found in the remote
  • Return type: (set[str])

propertyheads

retry_setup_local_repo(skip_confirm=False)

setup_local_repo(remote, location=None, in_cache=True)

Sets up the local git repository. If the repository has been cloned previously, it returns a git.Repo object of that clone. Otherwise it tries to clone the repository from the provided remote URL and returns a git.Repo of the new clone.

  • Parameters:
    • remote (str) – git url of remote
    • location (Path) – location where the clone should be created/cached.
    • in_cache (bool , optional) – Whether to clone the repository from the cache. Defaults to False.

Sets self.repo

propertytags

tidy_tags_and_branches()

Function to delete all tags and branches that are not of interest to the downloader. This allows a clutter-free experience in Seqera Platform. The untagged commits are evidently still available.

However, due to local caching, the downloader might also want access to revisions that had been deleted before. In that case, don’t bother with re-adding the tags and rather download anew from Github.