Changelog
Highlighted Features Added
* Toil server now exposes workflow tasks via WES (4046).
* Toil server now has a `--wes_dialect agc` option that will hide any tasks that don't have Amazon Batch job IDs, and put the IDs in the task names for those that do (4047).
* Toil jobs now accept an `accelerators` requirement, like `accelerators=1` or `accelerators={'kind': 'gpu', 'brand': 'nvidia', 'count': 2}` (4163)
* Include total requested cores for each job type in `toil stats` (4173)
* Toil jobs now expose `job.accelerators` to workflow
* Add prefix suffix params to `AbstractFileStore.getLocalTempFile` and `AbstractFileStore.getLocalTempFileName` (4273)
* CWL: `--no-compute-checksum`, `--strict-cpu-limit`, `--disable-validate`, and `--fast-parser` are now available
Breaking Changes
* Toil's built-in autoscaler now guesses that some memory and disk space on nodes will not actually be available for jobs; pass `--assumeZeroOverhead` to revert to the old behavior (2103)
CWL
* CWL job unit and display names have been changed to make more sense as task names, and management of them has been unified into a `CWLNamedJob`. (4046/4047)
* CWL `CUDARequirement` is parsed by `cwltool` and turned into a requirement for the minimum requested number of nvidia GPU accelerators (3982)
* fix false warning when outputSource contains only one None value (4300)
Kubernetes
* `KubernetesBatchSystem` can add `nvidia.com/gpu` and `amd.com/gpu` resource requests for jobs that request those accelerators (4163)
* `KubernetesBatchSystem` can request GPUs by `model` key, if nodes are labeled appropriately (4163)
Dependencies
Misc
* Toil WES server now accepts requests that leave out workflow_params. (4037)
* The `MessageBus` has been expanded to use `pypubsub`, and now has `MessageInbox` and `MessageOutbox` objects to represent connections to it. (4046/4047)
* `ToilMetrics` now rides on the `MessageBus` rails. (4046/4047)
* Toil workflows now have a `--writeMessages` option, which takes a file to which a line-oriented stream of `MessageBus` messages will be written. Reading this file will allow you to recover the current state of the workflow. (4046/4047)
* Add code for warning check to be used when launching cluster with AWS. (3514)
* Use a CI prebake image for gitlab testing. (4185)
* Toil clusters now have `/var/tmp` as the default temporary directory, since they often make large temporary files (4148)
* Adds basic testing for slurm using a slurm docker cluster by running sample workflows. (3856)
* Add message bus documentation (4239)
* `SingleMachineBatchSystem` can schedule nvidia GPU accelerators, limiting the concurrent jobs to no more than there are accelerators to support, and setting `CUDA_VISIBLE_DEVICES` in the tasks' environments to tell them which nvidia GPU(s) to use. (4163)
* `AWSBatchBatchSystem` can use AWS Batch's GPU resource to provide nvidia GPU accelerators (4163)
* Toil jobs no longer need to re-run after their child/followOn/service jobs in order to delete themselves. (3188)
* Message bus is now thread safe (4276)
* Docker build has been updated with new Aventer Mesos deb URL (fixes 4290)
* `docker` binary in the container has been updated to that included in the Ubuntu repos (fixes 4282)
* Singularity in the appliance has been updated to 3.10 which is >=3.9, for cgroups v2 support.
* Base Ubuntu container image for the appliance has been updated to 22.04, which has a new enough libc for Debian's Singularity 3.10 debs.
* Safer type usage checking for systems without boto3 installed
* Tests are now more runnable post-installation. Temporary paths are not selected based upon the location of the tests themselves. (4287)
_Bug Fixes_
* Only use `/var/run/user` if XDG tells us we have it in our session. Otherwise we will try other places, including `/run/lock/toil`. (4170)
* `toil destroy-cluster`: terminate stopped instances when destroying the cluster (4271)
* fileJobStore: handle arbitrary `os.link` errors to work on some filesystems (2232)
Thank you to our contributors!