Highlights
* We are now testing and publishing Ray's scalability limits with each release, see: https://github.com/ray-project/ray/tree/releases/1.3.0/benchmarks
* Ray Client is now usable by default with any Ray cluster started by the Ray Cluster Launcher.
Ray Cluster Launcher
💫Enhancements:
* Observability improvements (14816, 14608)
* Worker nodes no longer killed on autoscaler failure (14424)
* Better validation for min_workers and max_workers (13779)
* Auto detect memory resource for AWS and K8s (14567)
* On autoscaler failure, propagate error message to drivers (14219)
* Avoid launching GPU nodes when the workload only has CPU tasks (13776)
* Autoscaler/GCS compatibility (13970, 14046, 14050)
* Testing (14488, 14713)
* Migration of configs to multi-node-type format (13814, 14239)
* Better config validation (14244, 13779)
* Node-type max workers defaults infinity (14201)
🔨 Fixes:
* AWS configuration (14868, 13558, 14083, 13808)
* GCP configuration (14364, 14417)
* Azure configuration (14787, 14750, 14721)
* Kubernetes (14712, 13920, 13720, 14773, 13756, 14567, 13705, 14024, 14499, 14593, 14655)
* Other (14112, 14579, 14002, 13836, 14261, 14286, 14424, 13727, 13966, 14293, 14293, 14718, 14380, 14234, 14484)
Ray Client
💫Enhancements:
* Version checks for Python and client protocol (13722, 13846, 13886, 13926, 14295)
* Validate server port number (14815)
* Enable Ray client server by default (13350, 13429, 13442)
* Disconnect ray upon client deactivation (13919)
* Convert Ray objects to Ray client objects (13639)
* Testing (14617, 14813, 13016, 13961, 14163, 14248, 14630, 14756, 14786)
* Documentation (14422, 14265)
🔨 Fixes:
* Hook runtime context (13750)
* Fix mutual recursion (14122)
* Set gRPC max message size (14063)
* Monitor stream errors (13386)
* Fix dependencies (14654)
* Fix `ray.get` ctrl-c (14425)
* Report error deserialization errors (13749)
* Named actor refcounting fix (14753)
* RayTaskError serialization (14698)
* Multithreading fixes (14701)
Ray Core
🎉 New Features:
* We are now testing and publishing Ray's scalability limits with each release. Check out https://github.com/ray-project/ray/tree/releases/1.3.0/benchmarks.
* [alpha] Ray-native Python-based collective communication primitives for Ray clusters with distributed CPUs or GPUs.
🔨 Fixes:
* Ray is now using c++14.
* Fixed high CPU breaking raylets with heartbeat missing errors (13963, 14301)
* Fixed high CPU issues from raylet during object transfer (13724)
* Improvement in placement group APIs including better Java support (13821, 13858, 13582, 15049, 13821)
Ray Data Processing
🎉 New Features:
* Object spilling is turned on by default. Check out the [documentation](https://docs.ray.io/en/master/memory-management.html#object-spilling).
* Dask-on-Ray and Spark-on-Ray are fully ready to use. Please [try them out](https://docs.ray.io/en/master/raydp.html) and give us feedback!
* Dask-on-Ray is now compatible with Dask 2021.4.0.
* Dask-on-Ray now works natively with [`dask.persist()`](https://docs.dask.org/en/latest/api.html#dask.persist).
🔨 Fixes:
* Various improvements in object spilling and memory management layer to support large scale data processing (13649, 14149, 13853, 13729, 14222, 13781, 13737, 14288, 14578, 15027)
* `lru_evict` flag is now deprecated. Recommended solution now is to use object spilling.
🏗 Architecture refactoring:
* Various architectural improvements in object spilling and memory management. For more details, check out the [whitepaper](https://docs.google.com/document/d/1lAy0Owi-vPz2jEqBSaHNQcy2IBSDEHyXNOQZlGuj93c/edit#heading=h.61xcnifjkb6v).
* Locality-aware scheduling is turned on by default.
* Moved from centralized GCS-based object directory protocol to decentralized owner-to-owner protocol, yielding better cluster scalability.
RLlib
🎉 New Features:
* R2D2 implementation for torch and tf. (13933)
* PlacementGroup support (all RLlib algos now return PlacementGroupFactory from Trainer.default_resource_request). (14289)
* Multi-GPU support for tf-DQN/PG/A2C. (13393)
💫Enhancements:
* Documentation: Update documentation for Curiosity's support of continuous actions (13784); CQL documentation (14531)
* Attention-wrapper works with images and supports prev-n-actions/rewards options. (14569)
* `rllib rollout` runs in parallel by default via Trainer’s evaluation worker set. (14208)
* Add env rendering (customizable) and video recording options (for non-local mode; >0 workers; +evaluation-workers) and episode media logging. (14767, 14796)
* Allow SAC to use custom models as Q- or policy nets and deprecate "state-preprocessor" for image spaces. (13522)
* Example Scripts: Add coin game env + matrix social dilemma env + tests and examples (shoutout to Maxime Riché!). (14208); Attention net (14864); Serve + RLlib. (14416); Env seed (14471); Trajectory view API (enhancements and tf2 support). (13786); Tune trial + checkpoint selection. (14209)
* DDPG: Add support for simplex action space. (14011)
* Others: `on_learn_on_batch` callback allows custom metrics. (13584); Add `TorchPolicy.export_model()`. (13989)
🔨 Fixes:
* Trajectory View API bugs (13646, 14765, 14037, 14036, 14031, 13555)
* Test cases (14620, 14450, 14384, 13835, 14357, 14243)
* Others (13013, 14569, 13733, 13556, 13988, 14737, 14838, 15272, 13681, 13764, 13519, 14038, 14033, 14034, 14308, 14243)
🏗 Architecture refactoring:
* Remove all non-trajectory view API code. (14860)
* Obsolete UsageTrackingDict in favor of SampleBatch. (13065)
Tune
🎉 New Features:
* We added a new searcher `HEBOSearcher` (14504, 14246, 13863, 14427)
* Tune is now natively compatible with the Ray Client (13778, 14115, 14280)
* Tune now uses Ray’s Placement Groups underneath the hood. This will enable much faster autoscaling and training (for distributed trials) (13906, 15011, 14313)
💫Enhancements:
* Checkpointing improvements (13376, 13767)
* Optuna Search Algorithm improvements (14731, 14387)
* tune.with_parameters now works with Class API (14532)
🔨Fixes:
* BOHB & Hyperband fixes (14487, 14171)
* Nested metrics improvements (14189, 14375, 14379)
* Fix non-deterministic category sampling (13710)
* Type hints (13684)
* Documentation (14468, 13880, 13740)
* Various issues and bug fixes (14176, 13939, 14392, 13812, 14781, 14150, 14850, 14118, 14388, 14152, 13825, 13936)
SGD
* Add fault tolerance during worker startup (14724)
Serve
🎉 New Features:
* Added metadata to default logger in backend replicas (14251)
* Added more metrics for ServeHandle stats (13640)
* Deprecated system-level batching in favor of serve.batch (14610, 14648)
* Beta support for Serve with Ray client (14163)
* Use placement groups to bypass autoscaler throttling (13844)
* Deprecate client-based API in favor of process-wide singleton (14696)
* Add initial support for FastAPI ingress (14754)
🔨 Fixes:
* Fix ServeHandle serialization (13695)
🏗 Architecture refactoring:
* Refactor BackendState to support backend versioning and add more unit testing (13870, 14658, 14740, 14748)
* Optimize long polling to be per-key (14335)
Dashboard
🎉 New Features:
* Dashboard now supports being served behind a reverse proxy. (14012)
* Disk and network metrics are added to prometheus. (14144)
💫Enhancements:
* Better CPU & memory information on K8s. (14593, 14499)
* Progress towards a new scalable dashboard. (13790, 11667, 13763,14333)
Thanks
Many thanks to all those who contributed to this release:
geraint0923, iycheng, yurirocha15, brian-yu, harryge00, ijrsvt, wumuzi520, suquark, simon-mo, clarkzinzow, RaphaelCS, FarzanT, ob, ashione, ffbin, robertnishihara, SongGuyang, zhe-thoughts, rkooo567, Ezra-H, acxz, clay4444, QuantumMecha, jirkafajfr, wuisawesome, Qstar, guykhazma, devin-petersohn, jeroenboeye, ConeyLiu, dependabot[bot], fyrestone, micahtyong, javi-redondo, Manuscrit, mxz96102, EscapeReality846089495, WangTaoTheTonic, stanislav-chekmenev, architkulkarni, Yard1, tchordia, zhisbug, Bam4d, niole, yiranwang52, thomasjpfan, DmitriGekhtman, gabrieleoliaro, jparkerholder, kfstorm, andrew-rosenfeld-ts, erikerlandson, Crissman, raulchen, sumanthratna, Catch-Bull, chaokunyang, krfricke, raoul-khour-ts, sven1977, kathryn-zhou, AmeerHajAli, jovany-wang, amogkam, antoine-galataud, tgaddair, randxie, ChaceAshcraft, ericl, cassidylaidlaw, TanjaBayer, lixin-wei, lena-kashtelyan, cathrinS, qicosmos, richardliaw, rmsander, jCrompton, mjschock, pdames, barakmich, michaelzhiluo, stephanie-wang, edoakes