Release Highlights
This release features stability improvements and API clean-ups across the Ray libraries.
- In Ray Serve, we are deprecating the previously experimental DAG API for deployment graphs. Model composition will be supported through [deployment handles](https://docs.ray.io/en/latest/serve/model_composition.html) providing more flexibility and stability. The previously deprecated Ray Serve 1.x APIs have also been removed. We’ve also added a new Java APIs that aligns with the Ray Serve 2.x APIs. More API changes in the release notes below.
- In RLlib, we’ve moved 24 algorithms into `rllib_contrib` (still available within RLlib for Ray 2.8).
- We’ve added support for PyTorch-compatible input files shuffling for Ray Data. This allows users to randomly shuffle input files for better model training accuracy. This release also features new Ray Data datasources for Databricks and BigQuery.
- On the Ray Dashboard, we’ve added new metrics for Ray Data in the Metrics tab. This allows users to monitor Ray Data workload including real time metrics of cluster memory, CPU, GPU, output data size, etc. See [the doc](https://docs.ray.io/en/master/data/performance-tips.html#monitoring-your-application) for more details.
- Ray Core now supports profiling GPU tasks or actors using Nvidia Nsight. See [the documentation](https://docs.ray.io/en/master/ray-observability/user-guides/profiling.html?highlight=nsight#nsight-system-profiler) for instructions.
- We fixed 2 critical bugs raised by many kuberay / ML library users, including a child process leak issue from Ray worker that leaks the GPU memory (40182) and an job page excessive loading time issue when Ray HA cluster restarts a head node (40742)
- Python 3.7 support is officially deprecated from Ray.
Ray Libraries
Ray Data
🎉 New Features:
- Add support for shuffling input files (40154)
- Support streaming read of PyTorch dataset (39554)
- Add BigQuery datasource (37380)
- Add Databricks table / SQL datasource (39852)
- Add inverse transform functionality to LabelEncoder (37785)
- Add function arg params to `Dataset.map` and `Dataset.flat_map` (40010)
💫Enhancements:
- Hard deprecate `DatasetPipeline` (40129)
- Remove `BulkExecutor` code path (40200)
- Deprecate extraneous `Dataset` parameters and methods (40385)
- Remove legacy iteration code path (40013)
- Implement streaming output backpressure (40387)
- Cap op concurrency with exponential ramp-up (40275)
- Store ray dashboard metrics in `_StatsActor` (40118)
- Slice output blocks to respect target block size (40248)
- Drop columns before grouping by in `Dataset.unique()` (40016)
- Standardize physical operator runtime metrics (40173)
- Estimate blocks for limit and union operator (40072)
- Store bytes spilled/restored after plan execution (39361)
- Optimize `sample_boundaries` in `SortTaskSpec` (39581)
- Optimization to reduce ArrowBlock building time for blocks of size 1 (38833)
🔨 Fixes:
- Fix bug where `_StatsActor` errors with `PandasBlock` (40481)
- Remove deprecated `do_write` (40422)
- Improve error message when reading HTTP files (40462)
- Add flag to skip `get_object_locations` for metrics (39884)
- Fall back to fetch files info in parallel for multiple directories (39592)
- Replace deprecated `.pieces` with updated `.fragments` (39523)
- Backwards compatibility for `Preprocessor` that have been fit in older versions (39173)
- Removing unnecessary data copy in `convert_udf_returns_to_numpy` (39188)
- Do not eagerly free root `RefBundles` (39016)
📖Documentation:
- Remove out-of-date Data examples (40127)
- Remove unused and outdated source examples (40271)
Ray Train
🎉 New Features:
- Add initial support for scheduling workers on neuron_cores (39091)
💫Enhancements:
- Update PyTorch Lightning import path to support both `pytorch_lightning` and `lightning` (39841, 40266)
- Propagate driver `DataContext` to `RayTrainWorkers` (40116)
🔨 Fixes:
- Fix error propagation for as_directory if to_directory fails (40025)
📖Documentation:
- Update checkpoint hierarchy documentation for RayTrainReportCallbacks. (40174)
- Update Lightning RayDDPStrategy docstring (40376)
🏗 Architecture refactoring:
- Deprecate `LightningTrainer`, `AccelerateTrainer`, `TransformersTrainer (40163)
- Clean up legacy persistence mode code paths (39921, 40061, 40069, 40168)
- Deprecate legacy `DatasetConfig` (39963)
- Remove references to `DatasetPipeline` (40159)
- Enable isort (40172)
Ray Tune
💫Enhancements:
- Separate storage checkpoint index bookkeeping (39927, 40003)
- Raise an error if `Tuner.restore()` is called on an instance (39676)
🏗 Architecture refactoring:
- Clean up legacy persistence mode code paths (39918, 40061, 40069, 40168, 40175, 40192, 40181, 40193)
- Migrate TuneController tests (39704)
- Remove TuneRichReporter (40169)
- Remove legacy Ray Client tests (40415)
Ray Serve
💫Enhancements:
- The single-app configuration format for the [Serve Config](https://docs.ray.io/en/master/serve/production-guide/config.html#serve-in-production-config-file) (i.e. the Serve Config without the ‘applications’ field) has been deprecated in favor of the new configuration format.
Both single-app configuration and DAG API will be removed in 2.9.
- The Serve REST API is now accessible through the dashboard port, which defaults to `8265`.
- Accessing the Serve REST API through the dashboard agent port (default `52365`) is deprecated. The support will be removed in a future version.
- Ray job error tracebacks are now logged in the job driver log for easier access when jobs fail during start up.
- Deprecated single-application config file
- Deprecated DAG API: `InputNode` and `DAGDriver`
- Removed deprecated Deployment 1.x APIs: `Deployment.deploy()`, `Deployment.delete()`, `Deployment.get_handle()`
- Removed deprecated 1.x API: `serve.get_deployment` and `serve.list_deployments`
- New Java API supported (aligns with Ray Serve 2.x API)
🔨 Fixes:
- The `dedicated_cpu` and `detached` options in `serve.start()` have been fully disallowed.
- Error will be raised when users pass invalid gRPC service functions and fail early.
- The proxy’s readiness check now uses a linear backoff to avoid getting stuck in an infinite loop if it takes longer than usual to start.
- `grpc_options` on `serve.start()` was only allowing a `gRPCOptions` object in Ray 2.7.0. Dictionaries are now allowed to be used as`grpc_options` in the `serve.start()` call.
RLlib
💫Enhancements:
- `rllib_contrib` algorithms (A2C, A3C, AlphaStar 36584, AlphaZero 36736, ApexDDPG 36596, ApexDQN 36591, ARS 36607, Bandits 36612, CRR 36616, DDPG, DDPPO 36620, Dreamer(V1), DT 36623, ES 36625, LeelaChessZero 36627, MA-DDPG 36628, MAML, MB-MPO 36662, PG 36666, QMix 36682, R2D2, SimpleQ 36688, SlateQ 36710, and TD3 36726) all produce warnings now if used. See [here](https://github.com/ray-project/ray/tree/master/rllib_contrib#rllib-contrib) for more information on the `rllib_contrib` efforts. (36620, 36628, 3
- Provide msgpack checkpoint translation utility to convert checkpoint into msgpack format for being able to move in between python versions (38825).
🔨 Fixes:
- Issue 35440 (JSON output writer should include INFOS 39632)
- Issue 39453 (PettingZoo wrappers should use correct multi-agent dict spaces 39459)
- Issue 39421 (Multi-discrete action spaces not supported in new stack 39534)
- Issue 39234 (Multi-categorical distribution bug 39464)
39654, 35975, 39552, 38555
Ray Core and Ray Clusters
Ray Core
🎉 New Features:
- Python 3.7 support is officially deprecated from Ray.
- Supports profiling GPU tasks or actors using Nvidia Nsight. See [the doc](https://docs.ray.io/en/master/ray-observability/user-guides/profiling.html?highlight=nsight#nsight-system-profiler) for instructions.
- Ray on spark autoscaling is officially supported from Ray 2.8. See the [REP](https://github.com/ray-project/enhancements/pull/43) for more details.
💫Enhancements:
- IDLE node information in detail is available from ray status -v (39638)
- Adding a new accelerator to Ray is simplified with a new accelerator interface. See the in-flight [REP](https://github.com/ray-project/enhancements/blob/9936d231b4403cdbceb754a674395ffcf9a586e5/reps/2023-10-13-accelerator-support.md) for more details (#40286).
- Typing_extensions is removed from a dependency requirement because Python 3.7 support is deprecated. (40336)
- Ray state API supports case insensitive match. (34577)
- `ray start --runtime-env-agent-port` is officially supported. (39919)
- Driver exit code is available fromjob info (39675)
🔨 Fixes:
- Fixed a worker leak when Ray is used with placement group because Ray didn’t handle SIGTERM properly (40182)
- Fixed an issue job page loading takes a really long time when Ray HA cluster restarts a head node (40431)
- [core] loosen the check on release object (39570)
- [Core] ray init sigterm (39816)
- [Core] Non Unit Instance fractional value fix (39293)
- [Core]: Enable get_actor_name for actor runtime context (39347)
- [core][streaming][python] Fix asyncio.wait coroutines args deprecated warnings 40292
📖Documentation:
- The Ray streaming generator doc (alpha) is officially available at https://docs.ray.io/en/master/ray-core/ray-generator.html
Ray Clusters
💫Enhancements:
- Enable GPU support for vSphere cluster launcher (40667)
📖Documentation:
- Setup RBAC by KubeRay Helm chart
- KubeRay upgrade documentation
- RayService high availability
🔨 Fixes:
- Assorted fixes for vSphere cluster launcher (40487, 40516, 40655)
Dashboard
🎉 New Features:
- New metrics for ray data can be found in the Metrics tab.
🔨 Fixes:
- Fix bug where download log button did not download all logs for actors.
Thanks
Many thanks to all who contributed to this release!
scottjlee, chappidim, alexeykudinkin, ArturNiederfahrenhorst, stephanie-wang, chaowanggg, peytondmurray, maxpumperla, arvind-chandra, iycheng, JalinWang, matthewdeng, wfangchi, z4y1b2, alanwguo, Zandew, kouroshHakha, justinvyu, yuanchen8911, vitsai, hongchaodeng, allenwang28, caozy623, ijrsvt, omus, larrylian, can-anyscale, joncarter1, ericl, lejara, jjyao, Ox0400, architkulkarni, edoakes, raulchen, bveeramani, sihanwang41, WeichenXu123, zcin, Codle, dimakis, simonsays1980, cadedaniel, angelinalg, luv003, JingChen23, xwjiang2010, rynewang, Yicheng-Lu-llll, scrivy, michaelhly, shrekris-anyscale, xxnwj, avnishn, woshiyyya, aslonnie, amogkam, krfricke, pcmoritz, liuyang-my, jonathan-anyscale, rickyyx, scottsun94, richardliaw, rkooo567, stefanbschneider, kevin85421, c21, sven1977, GeneDer, matthew29tang, RocketRider, LaynePeng, samhallam-reverb, scv119, huchen2021