Highlights
- Major update of RLlib docs and example scripts for the new API stack.
Ray Libraries
Ray Data
π New Features:
- Expression support for filters (49016)
- Support `partition_cols` in `write_parquet` (49411)
- Feature: implement multi-directional sort over Ray Data datasets (49281)
π« Enhancements:
- Use dask 2022.10.2 (48898)
- Clarify schema validation error (48882)
- Raise `ValueError` when the data sort key is `None` (48969)
- Provide more messages when webdataset format is error (48643)
- Upgrade Arrow version from 17 to 18 (48448)
- Update `hudi` version to 0.2.0 (48875)
- `webdataset`: expand JSON objects into individual samples (48673)
- Support passing kwargs to map tasks. (49208)
- Add `ExecutionCallback` interface (49205)
- Add seed for read files (49129)
- Make `select_columns` and `rename_columns` use Project operator (49393)
π¨ Fixes:
- Fix partial function name parsing in `map_groups` (48907)
- Always launch one task for `read_sql` (48923)
- Reimplement of fix memory pandas (48970)
- `webdataset`: flatten return args (48674)
- Handle `numpy > 2.0.0` behaviour in `_create_possibly_ragged_ndarray` (48064)
- Fix `DataContext` sealing for multiple datasets. (49096)
- Fix `to_tf` for `List` types (49139)
- Fix type mismatch error while mapping nullable column (49405)
- Datasink: support passing write results to `on_write_completes` (49251)
- Fix `groupby` hang when value contains `np.nan` (49420)
- Fix bug where `file_extensions` doesn't work with compound extensions (49244)
- Fix map operator fusion when concurrency is set (49573)
Ray Train
π New Features:
- Output JSON structured log files for system and application logs (49414)
- Add support for AMD ROCR_VISIBLE_DEVICES (49346)
π« Enhancements:
- Implement Train Tune API Revamp REP (49376, 49467, 49317, 49522)
π Architecture refactoring:
- LightGBM: Rewrite `get_network_params` implementation (49019)
Ray Tune
π New Features:
- Update `optuna_search` to allow users to configure optuna storage (48547)
π Architecture refactoring:
- Make changes to support Train Tune API Revamp REP (49308, 49317, 49519)
Ray Serve
π« Enhancements:
- Improved request_id generation to reduce proxy CPU overhead (49537)
- Tune GC threshold by default in proxy (49720)
- Use `pickle.dumps` for faster serialization from `proxy` to `replica` (49539)
π¨ Fixes:
- Handle nested β=β in serve run arguments (49719)
- Fix bug when `ray.init()` is called multiple times with different `runtime_envs` (49074)
ποΈ Deprecations:
- Adds a warning that the default behavior for sync methods will change in a future release. They will be run in a threadpool by default. You can opt into this behavior early by setting `RAY_SERVE_RUN_SYNC_IN_THREADPOOL=1`. (48897)
RLlib
π New Features:
- Add support for external Envs to new API stack: New example script and custom tcp-capable EnvRunner. (49033)
π« Enhancements:
- Offline RL:
- Add sequence sampling to `EpisodeReplayBuffer`. (48116)
- Allow incomplete `SampleBatch` data and fully compressed observations. (48699)
- Add option to customize `OfflineData`. (49015)
- Enable offline training without specifying an environment. (49041)
- Various fixes: 48309, 49194, 49195
- APPO/IMPALA acceleration (new API stack):
- Add support for `AggregatorActors` per Learner. (49284)
- Auto-sleep time AND thread-safety for MetricsLogger. (48868)
- Activate APPO cont. actions release- and CI tests (HalfCheetah-v1 and Pendulum-v1 new in `tuned_examples`). (49068)
- Add "burn-in" period setting to the training of stateful RLModules. (49680)
- Callbacks API: Add support for individual lambda-style callbacks. (49511)
- Other enhancements: 49687, 49714, 49693, 49497, 49800, 49098
π Documentation:
- New example scripts:
- How to write a custom algorithm (VPG) from scratch. (49536)
- How to customize an offline data pipeline. (49046)
- GPUs on EnvRunners. (49166)
- Hierarchical training. (49127)
- Async gym vector env. (49527)
- Other fixes and enhancements: 48988, 49071
- New/rewritten html pages:
- Rewrite checkpointing page. (49504)
- New scaling guide. (49528)
- New callbacks page. (49513)
- Rewrite `RLModule` page. (49387)
- New AlgorithmConfig page and redo `package_ref` page for algo configs. (49464)
- Rewrite offline RL page. (48818)
- Rewrite βkey concepts" rst page. (49398)
- Rewrite RL environments pages. (49165, 48542)
- Fixes and enhancements: 49465, 49037, 49304, 49428, 49474, 49399, 49713, 49518
π¨ Fixes:
- Add `on_episode_created` callback to SingleAgentEnvRunner. (49487)
- Fix `train_batch_size_per_learner` problems. (49715)
- Various other fixes: 48540, 49363, 49418, 49191
π Architecture refactoring:
- RLModule: Introduce `Default[algo]RLModule` classes (49366, 49368)
- Remove RLlib dependencies from setup.py; add `ormsgpack` (49489)
ποΈ Deprecations:
- 49488, 49144
Ray Core and Ray Clusters
Ray Core
π« Enhancements:
- Add `task_name`, `task_function_name` and `actor_name` in Structured Logging (48703)
- Support redis/valkey authentication with username (48225)
- Add v6e TPU Head Resource Autoscaling Support (48201)
- compiled graphs: Support all driver and actor read combinations (48963)
- compiled graphs: Add ascii based CG visualization (48315)
- compiled graphs: Add ray[cg] pip install option (49220)
- Allow uv cache at installation (49176)
- Support != Filter in GCS for Task State API (48983)
- compiled graphs: Add CPU-based NCCL communicator for development (48440)
- Support gcs and raylet log rotation (48952)
- compiled graphs: Support `nsight.nvtx` profiling (49392)
π¨ Fixes:
- autoscaler: Health check logs are not visible in the autoscaler container's stdout (48905)
- Only publish `WORKER_OBJECT_EVICTION` when the object is out of scope or manually freed (47990)
- autoscaler: Autoscaler doesn't scale up correctly when the KubeRay RayCluster is not in the goal state (48909)
- autoscaler: Fix incorrectly terminating nodes misclassified as idle in autoscaler v1 (48519)
- compiled graphs: Fix the missing dependencies when num_returns is used (49118)
- autoscaler: Fuse scaling requests together to avoid overloading the Kubernetes API server (49150)
- Fix bug to support S3 pre-signed url for `.whl` file (48560)
- Fix data race on gRPC client context (49475)
- Make sure draining node is not selected for scheduling (49517)
Ray Clusters
π« Enhancements:
- Azure: Enable accelerated networking as a flag in azure vms (47988)
π Documentation:
- Kuberay: Logging: Add Fluent Bit `DaemonSet` and Grafana Loki to "Persist KubeRay Operator Logs" (48725)
- Kuberay: Logging: Specify the Helm chart version in "Persist KubeRay Operator Logs" (48937)
Dashboard
π« Enhancements:
- Add instance variable to many default dashboard graphs (49174)
- Display duration in milliseconds if under 1 second. (49126)
- Add `RAY_PROMETHEUS_HEADERS` env for carrying additional headers to Prometheus (49353)
- Document about the `RAY_PROMETHEUS_HEADERS` env for carrying additional headers to Prometheus (49700)
π Architecture refactoring:
- Move `memray` dependency from default to observability (47763)
- Move `StateHead`'s methods into free functions. (49388)
Thanks
raulchen, alanwguo, omatthew98, xingyu-long, tlinkin, yantzu, alexeykudinkin, andrewsykim, win5923, csy1204, dayshah, richardliaw, stephanie-wang, gueraf, rueian, davidxia, fscnick, wingkitlee0, KPostOffice, GeneDer, MengjinYan, simonsays1980, pcmoritz, petern48, kashiwachen, pfldy2850, zcin, scottjlee, Akhil-CM, Jay-ju, JoshKarpel, edoakes, ruisearch42, gorloffslava, jimmyxie-figma, bthananjeyan, sven1977, bnorick, jeffreyjeffreywang, ravi-dalal, matthewdeng, angelinalg, ivanthewebber, rkooo567, srinathk10, maresb, gvspraveen, akyang-anyscale, mimiliaogo, bveeramani, ryanaoleary, kevin85421, richardsliu, hartikainen, coltwood93, mattip, Superskyyy, justinvyu, hongpeng-guo, ArturNiederfahrenhorst, jecsand838, Bye-legumes, hcc429, WeichenXu123, martinbomio, HollowMan6, MortalHappiness, dentiny, zhe-thoughts, anyadontfly, smanolloff, richo-anyscale, khluu, xushiyan, rynewang, japneet-anyscale, jjyao, sumanthratna, saihaj, aslonnie
Many thanks to all those who contributed to this release!