- consolidated gke tpu/gpu spec parsing with cloud types
- modified all commands to accept as the module argument paths to arbitrary
shell scripts. Any argument of the format "trainer.train" will execute using
"python -m trainer.train", just as before. If instead you pass a python script
as a file, like "trainer/train.py", caliban will execute this file inside the
container using "python trainer/train.py". Any other argument, if it exists in
the local directory, will be executed as a bash script.
This allows users to run commands like "caliban cloud my_script.sh" and have
it all work.
- "caliban run" now supports --experiment_config and --dry_run. These work just
like they do for "caliban cloud"; the experiment config will expand out and
execute N jobs on your local machine.
- moved some methods from cluster/cluster.py to gke/utils.py
- added unit tests for some gke/utils.py methods
- Support for ADC credentials! if application_default_credentials.json is
present on the user's machine, they now get copied into the container.
- if ADC credentials are NOT present but a service account key is we write a
placeholder. this is required to get ctpu working inside containers.