Most parts of the code is rewritten for Caper 1.0.0.
Upgraded Cromwell from 47 to 51.
- Metadata DB generated with Caper<1.0 will not work with Caper>=1.0.
- See [this note](https://github.com/broadinstitute/cromwell/releases/tag/49) to find DB migration instruction.
Changed hashing strategy for all local backends (`local`, `slurm`, `sge`, `pbs`).
- Default hashing strategy: `file` (based on md5sum, which is expensive) to `path+modtime`.
- Changing hashing strategy and using the same metadata DB will result in cache-miss.
Changed duplication strategy for all local backends (`local`, `slurm`, `sge`, `pbs`).
- Default file duplication strategy: `hard-link` to `soft-link`.
- For filesystems (e.g. beeGFS) that do not allow hard-linking.
- Caper<1.0 hard-linked input files even with `--soft-glob-output`.
- For Caper>=1.0, you still need to use `--soft-glob-output` for such filesystems.
Google Cloud Platform backend (`gcp`):
- Cau use a service account instead of an application default (end user's auth.).
- Added `--gcp-service-account-key-json`.
- Make sure that such service account has enough permission (roles) to resources on Google Cloud Platform project (`--gcp-prj`). See [details](docs/conf_gcp.mdhow-to-run-caper-with-a-service-account).
- Can use Google Cloud Life Sciences API (v2beta) instead of deprecating Google Cloud Genomics API (v2alpha1).
- Added `--use-google-cloud-life-sciences`.
- For `caper server/run`, you need to specify a region `--gcp-region` to use Life Sciences API. Check [supported regions](https://cloud.google.com/life-sciences/docs/concepts/locations). `--gcp-zones` will be ignored.
- Make sure to enable `Google Cloud Life Sciences API` on Google Cloud Platform console (APIs & Services -> `+` button on top).
- Also if you use a service account then add a role `Life Sciences Admin` to your service account.
- We will deprecate old `Genomics API` support. `Life Sciences API` will become a new default after next 2-3 releases.
- Added [`memory-retry`](https://cromwell.readthedocs.io/en/stable/backends/Google/) to Caper. This is for `gcp` backend only.
- Retries (controlled by `--max-retries`) on an instance with increased memory if workflow fails due to OOM (out-of-memory) error.
- Comma-separated keys to catch OOM: `--gcp-prj-memory-retry-error-keys`.
- Multiplier for every retrial due to OOM: `--gcp-prj-memory-retry-multiplier`.
Improved Python interface.
- Old Caper<1.0 was originally designed for CLI.
- New Caper>=1.0 is designed for Python interface first and then CLI is based on such Python interface.
- Can retrieve `metadata.json` embedded with subworkflows' metadata JSON.
Better logging and troubleshooting.
- Defaults to write Cromwell STDOUT/STDERR to `cromwell.out` (controlled by `--cromwell-stdout`).
Notes for devs
Server/run example:
python
c = CaperRunner(
local_loc_dir='/scratch/me/tmp_loc_dir',
local_out_dir='/scratch/me/out',
default_backend='Local')
get server thread
th = c.server(port=8000)
do something
while th.returncode is None:
break
stop the server
th.stop()
wait
th.join()
run example
metadata_dict = c.run('my.wdl', inputs='my_input.json', ...)
Client example
python
cs = CaperClientSubmit(hostname='localhost', port=8000)
r = c.submit('my.wdl', inputs='my_inputs.json', imports='my_imports.zip', ...)
workflow_id = r['id']
for m in c.metadata([workflow_id], embed_subworkflow=True):
m = metadata dict embedded with subworkflows' metadata JSON
print(m['status'])
How to read from conf_file.
python
from caper.caper_args import get_parser_and_defaults
get both argparse.ArgumentParser (defaults updated with contents in conf_file)
and conf_dict including key/value pairs in conf_file.
such value is converted to a correct type (guessed from ArgumentParser's defaults).
parser, conf_dict = get_parser_and_defaults(conf_file='~/.caper/default.conf')
server_port_from_conf = conf_dict['port']