Manual debugging on clusters using schedulers
In some cases, when testing configuration files on clusters using a scheduler, you may get failed jobs with ‘uninformative’ errors and vague information being given for the cause.
In such cases, a good way of debugging such a failed job is to change to the working directory of the failed process (which should be reported by Nextflow), and try to manually submit the job.
You can do this by submitting to your cluster the .command.run
file found in the working directory using the relevant submission command.
For example, let’s say you get an error like this on a SLURM cluster.
This does not tell you why the job failed to submit, but is often is due to a ‘invalid’ resource submission request, and the scheduler blocks it. But unfortunately, Nextflow does not pick the message reported by the cluster.
Therefore, in this case I would switch to the working directory, and submit the .command.run
file using SLURM’s sbatch
command (for submitting batch scripts).
In this case, SLURM has printed to my console the reason why, the job failed to be submitted.
With this information, I can go back to my configuration file, and tweak the settings accordingly, and run the pipeline again.