Highlight
---------
- Ray is moving towards 1.0! It has had several important naming changes.
- `ObjectID`s are now called `ObjectRef`s because they are not just IDs.
- The Ray Autoscaler is now called the Ray Cluster Launcher. The autoscaler will be a module of the Ray Cluster Launcher.
- The Ray Cluster Launcher now has a much cleaner and concise output style. Try it out with `ray up --log-new-style`. The new output style will be enabled by default (with opt-out) in a later release.
- Windows is now officially supported by RLlib. Multi node support for Windows is still in progress.
Cluster Launcher/CLI (formerly autoscaler)
--------------------------------------------
- **Highlight:** This release contains a new colorful, concise output style for `ray up` and `ray down`, available with the `--log-new-style` flag. It will be enabled by default (with opt-out) in a later release. Full output style coverage for Cluster Launcher commands will also be available in a later release. (9322, 9943, 9960, 9690)
- Documentation improvements (with guides and new sections) (9687
- Improved Cluster launcher docker support (9001, 9105, 8840)
- Ray now has Docker images available on Docker hub. Please check out the [ray image](https://hub.docker.com/u/rayproject/ray) (#9732, 9556, 9458, 9281)
- Azure improvements (8938)
- Improved on-prem cluster autoscaler (9663)
- Add option for continuous sync of file mounts (9544)
- Add `ray status` debug tool and `ray --version` (9091, 8886).
- `ray memory` now also supports redis_password (9492)
- Bug fixes for the Kubernetes cluster launcher mode (9968)
- __Various improvements:__ disabling the cluster config cache (8117), Python API requires keyword arguments (9256), removed fingerprint checking for SSH (9133), Initial support for multiple worker types (9096), various changes to the internal node provider interface (9340, 9443)
Core
-----
- Support Python type checking for Ray tasks (9574)
- Rename ObjectID => ObjectRef (9353)
- New GCS Actor manager on by default (8845, 9883, 9715, 9473, 9275)
- Worker towards placement groups (9039)
- Plasma store process is merged with raylet (8939, 8897)
- Option to automatically reconstruct objects stored in plasma after a failure. See the [documentation](https://docs.ray.io/en/master/fault-tolerance.html#objects) for more information. (9394, 9557, 9488)
- Many bug fixes.
RLlib
-----
- New algorithm: __“Model-Agnostic Meta-Learning” (MAML)__. An algo that learns and generalizes well across a __distribution__ of environments.
- New algorithm: __“Model-Based Meta-Policy-Optimization” (MB-MPO)__. Our first __model-based RL algo__.
- __Windows__ is now __officially supported__ by RLlib.
- __Native TensorFlow 2.x support__. Use framework=”tf2” in your config to tap into TF2’s full potential. Also: SAC, DDPG, DQN Rainbow, ES, and ARS now run in TF1.x Eager mode.
- __DQN PyTorch__ support for full Rainbow setup (including distributional DQN).
- __Python type hints__ for Policy, Model, Offline, Evaluation, and Env classes.
- __Deprecated “Policy Optimizer”__ package (in favor of new distributed execution API).
- Enhanced __test coverage__ and __stability__.
- __Flexible multi-agent replay modes__ and `replay_sequence_length`. We now allow a) storing sequences (over time) in replay buffers and retrieving “lock-stepped” multi-agent samples.
- Environments: __Unity3D soccer game__ (tuned example/benchmark) and __DM Control__ Suite wrapper and examples.
- Various __Bug fixes__: QMIX not learning, DDPG torch bugs, IMPALA learning rate updates, PyTorch custom loss, PPO not learning MuJoCo due to action clipping bug, DQN w/o dueling layer error.
Tune
-----
- **API Changes**:
- The Tune Function API now supports checkpointing and is now usable with all search and scheduling algorithms! (8471, 9853, 9517)
- The Trainable class API has renamed many of its methods to be public (9184)
- You can now stop experiments upon convergence with Bayesian Optimization (8808)
- `DistributedTrainableCreator`, a simple wrapper for distributed parameter tuning with multi-node DistributedDataParallel models (9550, 9739)
- New integration and tutorial for using Ray Tune with __Weights and Biases__ (Logger and native API) (9725)
- Tune now provides a Scikit-learn compatible wrapper for hyperparameter tuning (9129)
- __New tutorials__ for integrations like __XGBoost__ (9060), __multi GPU PyTorch__ (9338), __PyTorch Lightning__ (9151, 9451), and __Huggingface-Transformers__ (9789)
- CLI Progress reporting improvements (8802, 9537, 9525)
- Various __bug fixes__: handling of NaN values (9381), Tensorboard logging improvements (9297, 9691, 8918), enhanced cross-platform compatibility (9141), re-structured testing (9609), documentation reorganization and versioning (9600, 9427, 9448)
RaySGD
--------
- Variable worker CPU requirements (8963)
- Simplified cuda visible device setting (8775)
Serve
------
- Horizontal scalability: Serve will now start one HTTP server per Ray node. (9523)
- Various performance improvement matching Serve to FastAPI (9490,8709, 9531, 9479 ,9225, 9216, 9485)
- API changes
- `serve.shadow_traffic(endpoint, backend, fraction)` duplicates and sends a fraction of the incoming traffic to a specific backend. (9106)
- `serve.shutdown()` cleanup the current Serve instance in Ray cluster. (8766)
- Exception will be raised if `num_replicas` exceeds the maximum resource in the cluster (9005)
- Added doc examples for how to perform metric [monitoring](https://docs.ray.io/en/master/serve/advanced.html#monitoring) and [model composition](https://docs.ray.io/en/master/serve/advanced.html#composing-multiple-models).
Dashboard
-----------
- __Configurable Dashboard Port__: The port on which the dashboard will run is now configurable using the argument `--dashboard-port` and the argument `dashboard_port` to `ray.init`
- __GPU monitoring improvements__
- For machines with more than one GPU, the GPU and GRAM utilization is now broken out on a per-GPU basis.
- Assignments to physical GPUs are now shown at the worker level.
- __Sortable Machine View__: It is now possible to sort the machine view by almost any of its columns by clicking next to the title. In addition, whereas the workers are normally grouped by node, you can now ungroup them if you only want to see details about workers.
- __Actor Search Bar__: It is possible to search for actors by their title now (this is the class name of the actor in python in addition to the arguments it received.)
- __Logical View UI Updates__: This includes things like color-coded names for each of the actor states, a more grid-like layout, and tooltips for the various data.
- __Sortable Memory View__: Like the machine view, the memory view now has sortable columns and can be grouped / ungrouped by node.
Windows Support
------------------
- Improve GPU detection (9300)
- Work around msgpack issue on PowerPC64LE (9140)
Others
-------
- Ray Streaming Library Improvements (9240, 8910, 8780)
- Java Support Improvements (9371, 9033, 9037, 9032, 8858, 9777, 9836, 9377)
- Parallel Iterator Improvements (8964, 8978)
Thanks
-------
We thank the following contributors for their work on this release:
jsuarez5341, amitsadaphule, krfricke, williamFalcon, richardliaw, heyitsmui, mehrdadn, robertnishihara, gabrieleoliaro, amogkam, fyrestone, mimoralea, edoakes, andrijazz, ElektroChan89, kisuke95, justinkterry, SongGuyang, barakmich, bloodymeli, simon-mo, TomVeniat, lixin-wei, alanwguo, zhuohan123, michaelzhiluo, ijrsvt, pcmoritz, LecJackS, sven1977, ashione, JerryLeeCS, raphaelavalos, stephanie-wang, ruifangChen, vnlitvinov, yncxcw, weepingwillowben, goulou, acmore, wuisawesome, gramhagen, anabranch, internetcoffeephone, Alisahhh, henktillman, deanwampler, p-christ, Nicolaus93, WangTaoTheTonic, allenyin55, kfstorm, rkooo567, ConeyLiu, 09wakharet, piojanu, mfitton, KristianHolsheimer, AmeerHajAli, pdames, ericl, VishDev12, suquark, stefanbschneider, raulchen, dcfidalgo, chappers, aaarne, chaokunyang, sumanthratna, clarkzinzow, BalaBalaYi, maximsmol, zhongchun, wumuzi520, ffbin