Highlights
--------
- Autoscaler has added Azure Support. (7080, 7515, 7558, 7494)
- Ray autoscaler helps you launch a distributed ray cluster using a single command line call!
- It works on Azure, AWS, GCP, Kubernetes, Yarn, Slurm and local nodes.
- Distributed reference counting is turned on by default. (7628, 7337)
- This means all ray objects are tracked and garbage collected only when all references go out of scope. It can be turned off with: `ray.init(_internal_config=json.dumps({"distributed_ref_counting_enabled": 0}))`.
- When the object store is full with objects that are still in scope, you can turn on least-recently-used eviction to force remove objects using `ray.init(lru_evict=True)`.
- A new command `ray memory` is added to help debug memory usage: (7589)
- It shows all object IDs that are in scope, their reference types, sizes and creation site.
- Read more in the docs: https://ray.readthedocs.io/en/latest/memory-management.html.
> ray memory
-----------------------------------------------------------------------------------------------------
Object ID Reference Type Object Size Reference Creation Site
=====================================================================================================
; worker pid=51230
ffffffffffffffffffffffff0100008801000000 PINNED_IN_MEMORY 8231 (deserialize task arg) __main__..sum_task
; driver pid=51174
45b95b1c8bd3a9c4ffffffff010000c801000000 USED_BY_PENDING_TASK ? (task call) memory_demo.py:<module>:13
ffffffffffffffffffffffff0100008801000000 USED_BY_PENDING_TASK 8231 (put object) memory_demo.py:<module>:6
ef0a6c221819881cffffffff010000c801000000 LOCAL_REFERENCE ? (task call) memory_demo.py:<module>:14
-----------------------------------------------------------------------------------------------------
API change
----------
- Change `actor.__ray_kill__()` to `ray.kill(actor)`. (7360)
- Deprecate `use_pickle` flag for serialization. (7474)
- Remove `experimental.NoReturn`. (7475)
- Remove `experimental.signal API`. (7477)
Core
----
- Add Apache 2 license header to C++ files. (7520)
- Reduce per worker memory usage to 50MB. (7573)
- Option to fallback to LRU on OutOfMemory. (7410)
- Reference counting for actor handles. (7434)
- Reference counting for returning object IDs created by a different process. (7221)
- Use `prctl(PR_SET_PDEATHSIG)` on Linux instead of reaper. (7150)
- Route asyncio plasma through raylet instead of direct plasma connection. (7234)
- Remove static concurrency limit from gRPC server. (7544)
- Remove `get_global_worker()`, `RuntimeContext`. (7638)
- Fix known issues from 0.8.2 release:
- Fix passing duplicate by-reference arguments. (7306)
- Fix Raise gRPC message size limit to 100MB. (7269)
RLlib
-----
- New features:
- Exploration API improvements. (7373, 7314, 7380)
- SAC: add discrete action support. (7320, 7272)
- Add high-performance external application connector. (7641)
- Bug fix highlights:
- PPO torch memory leak and unnecessary torch.Tensor creation and gc'ing. (7238)
- Rename sample_batch_size => rollout_fragment_length. (7503)
- Fix bugs and speed up SegmentTree.
Tune
----
- Integrate Dragonfly optimizer. (5955)
- Fix HyperBand errors. (7563)
- Access Trial Name, Trial ID inside trainable. (7378)
- Add a new `repeater` class for high variance trials. (7366)
- Prevent deletion of checkpoint from user-initiated restoration. (7501)
Libraries
---------
- [Parallel Iterators] Allow for operator chaining after repartition. (7268)
- [Parallel Iterators] Repartition functionality. (7163)
- [Serve] `serve.route` returns a handle, add `handle.scale`, `handle.set_max_batch_size`. (7569)
- [RaySGD] PyTorchTrainer --> TorchTrainer. (7425)
- [RaySGD] Custom training API. (7211)
- [RaySGD] Breaking User API changes: (7384)
- `data_creator` fed to TorchTrainer now must return a dataloader rather than datasets.
- TorchTrainer automatically sets "DistributedSampler" if a DataLoader is returned.
- `data_loader_config` and `batch_size` are no longer parameters for TorchTrainer.
- TorchTrainer parallelism is now set by `num_workers`.
- All TorchTrainer args now must be named parameters.
Java
----
- New Java actor API (7414)
- `RayRemote` annotation is removed.
- Instead of `Ray.call(ActorClass::method, actor)`, the new API is `actor.call(ActorClass::method)`.
- Allow passing internal config from raylet to Java worker. (7532)
- Enable direct call by default. (7408)
- Pass large object by reference. (7595)
Others
------
- Progress towards Ray Streaming, including a Python API. (7070, 6755, 7152, 7582)
- Progress towards GCS Service for GCS fault tolerance. (7292, 7592, 7601, 7166)
- Progress towards cross language call between Java and Python. (7614, 7634)
- Progress towards Windows compatibility. (7529, 7509, 7658, 7315)
- Improvement in K8s Operator. (7521, 7621, 7498, 7459, 7622)
- New documentation for Ray Dashboard. (7304)
Known issues
--------------
- Ray currently doesn't work on Python 3.5.0, but works on 3.5.3 and above.
Thanks
------
We thank the following contributors for their work on this release:
rkooo567, maximsmol, suquark, mitchellstern, micafan, ClarkZinzow, Jimpachnet, mwbrulhardt, ujvl, chaokunyang, robertnishihara, jovany-wang, hyeonjames, zhijunfu, datayjz, fyrestone, eisber, stephanie-wang, allenyin55, BalaBalaYi, simon-mo, thedrow, ffbin, amogkam, TisonKun, richardliaw, ijrsvt, wumuzi520, mehrdadn, raulchen, landcold7, ericl, edoakes, sven1977, ashione, jorenretel, gramhagen, kfstorm, anthonyhsyu, pcmoritz