- RLlib [ModelV2API](https://ray.readthedocs.io/en/latest/rllib-models.html) is ready to use. It improves support for Keras and RNN models, as well as allowing object-oriented reuse of variables. ModelV1 API is deprecated. No migration is needed.
- `ray.experimental.sgd.pytorch.PyTorchTrainer` is ready for early adopters. Checkout the documentation [here](https://ray.readthedocs.io/en/latest/distributed_training.html). We welcome your feedback!
model_creator = lambda config: YourPyTorchModel()
data_creator = lambda config: YourTrainingSet(), YourValidationSet()
trainer = PyTorchTrainer(
config={"lr": 1e-4},
for i in range(NUM_EPOCHS):
- You can query all the clients that have performed `ray.init` to connect to the current cluster with `ray.jobs()`. 5076
>>> ray.jobs()
[{'JobID': '02000000',
'NodeManagerAddress': '',
'DriverPid': 74949,
'StartTime': 1564168784,
'StopTime': 1564168798},
{'JobID': '01000000',
'NodeManagerAddress': '',
'DriverPid': 74871,
'StartTime': 1564168742}]
- Improvement on memory storage handling. 5143, 5216, 4893
- Improved workflow:
- Debugging tool `local_mode` now behaves more consistently. 5060
- Improved KeyboardInterrupt Exception Handling, stack trace reduced from 115 lines to 22 lines. 5237
- Ray core:
- Experimental direct actor call. 5140, 5184
- Improvement in core worker, the shared module between Python and Java. 5079, 5034, 5062
- GCS (global control store) was refactored. 5058, 5050
- Finished port of all major RLlib algorithms to builder pattern 5277, 5258, 5249
- `learner_queue_timeout` can be configured for async sample optimizer. 5270
- `reproducible_seed` can be used for reproducible experiments. 5197
- Added entropy coefficient decay to IMPALA, APPO and PPO 5043
- **Breaking:** `ExperimentAnalysis` is now returned by default from `tune.run`. To obtain a list of trials, use `analysis.trials`. 5115
- **Breaking:** Syncing behavior between head and workers can now be customized (`sync_to_driver`). Syncing behavior (`upload_dir`) between cluster and cloud is now separately customizable (`sync_to_cloud`). This changes the structure of the uploaded directory - now `local_dir` is synced with `upload_dir`. 4450
- Introduce `Analysis` and `ExperimentAnalysis` objects. `Analysis` object will now return all trials in a folder; `ExperimentAnalysis` is a subclass that returns all trials of an experiment. 5115
- Add missing argument `tune.run(keep_checkpoints_num=...)`. Enables only keeping the last N checkpoints. 5117
- Trials on failed nodes will be prioritized in processing. 5053
- Trial Checkpointing is now more flexible. 4728
- Add system performance tracking for gpu, ram, vram, cpu usage statistics - toggle with `tune.run(log_sys_usage=True)`. 4924
- Experiment checkpointing frequency is now less frequent and can be controlled with `tune.run(global_checkpoint_period=...)`. 4859
- Add a `request_cores` function for manual autoscaling. You can now manually request resources for the autoscaler. 4754
- Local cluster:
- More readable example yaml with comments. 5290
- Multiple cluster name is supported. 4864
- Improved logging with AWS NodeProvider. `create_instance` call will be logged. 4998
Others Libraries:
- SGD:
- Example for Training. 5292
- Deprecate old distributed SGD implementation. 5160
- Kuberentes: Ray namespace added for k8s. 4111
- Dev experience: Add linting pre-push hook. 5154
We thank the following contributors for their amazing contributions:
joneswong, 1beb, richardliaw, pcmoritz, raulchen, stephanie-wang, jiangzihao2009, LorenzoCevolani, kfstorm, pschafhalter, micafan, simon-mo, vipulharsh, haje01, ls-daniel, hartikainen, stefanpantic, edoakes, llan-ml, alex-petrenko, ztangent, gravitywp, MQQ, dulex123, morgangiraud, antoine-galataud, robertnishihara, qxcv, vakker, jovany-wang, zhijunfu, ericl