PyTorch Elastic
> **_NOTE:_** As of torch-1.7 and torchelastic-0.2.1 torchelastic will be bundled into the main [pytorch docker](https://hub.docker.com/r/pytorch/pytorch)
image. [torchelastic/examples](https://hub.docker.com/r/torchelastic/examples) will be available post torch-1.7 release since
its base image will now be **pytorch/pytorch**
* Torchelastic agent:
* `run_id` available to workers as `TORCHELASTIC_RUN_ID` environment variable
* Allow `max_restarts=0`
* Worker exit barrier added to torchelastic agent to protect against variances in worker finish times
* Improvements to error handling and propagation from torchelastic agent
* Enable fault handlers on worker processes to get torch C++ stack traces
* `torchelastic.distributed.launch` CLI:
* New option `--role` to allow users to set worker role name
* CLI options can now be set via environment variables (e.g. `PET_NNODES="1:2"`)
* Project:
* Upgraded to Python 3.8
* Tests moved to `test` directory within the respective modules
* Use Pyre
* Deprecated:
* [pytorch/elastic](https://hub.docker.com/r/pytorch/elastic) Docker image
* Experimental:
* [Training Session Manager (TSM)](http://pytorch.org/elastic/0.2.1/tsm_driver.html) with localhost scheduler
* [torchelastic.multiprocessing](http://pytorch.org/elastic/0.2.1/multiprocessing.html)