Ray

Latest version: v2.22.0

Safety actively analyzes 630052 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 15

2.22.0

Ray Libraries<a id="ray-libraries"></a>

Ray Data<a id="ray-data"></a>

🎉 New Features:
- Add function to dynamically generate `ray_remote_args` for Map APIs (45143)
- Allow manually setting resource limits for training jobs (45188)

💫 Enhancements:
- Introduce abstract interface for data autoscaling (45002)
- Add debugging info for `SplitCoordinator` (45226)

🔨 Fixes:
- Don’t show `AllToAllOperator` progress bar if the disable flag is set (45136)
- Don't load Arrow `PyExtensionType` by default (45084)
- Don't raise batch size error if `num_gpus=0` (45202)

Ray Train<a id="ray-train"></a>

💫 Enhancements:
- [XGBoost][LightGBM] Update RayTrainReportCallback to only save checkpoints on rank 0 (45083)

Ray Core<a id="ray-core"></a>

🔨 Fixes:
- Fix the cpu percentage metrics for dashboard process (45124)

Dashboard<a id="dashboard"></a>

💫 Enhancements:
- Improvements to log viewer so line numbers do not get selected when copying text.
- Improvements to the log viewer to avoid unnecessary re-rendering which causes text selection to clear.

Many thanks to all those who contributed to this release: justinvyu, simonsays1980, chris-ray-zhang, kevin85421, angelinalg, rynewang, brycehuang30, alanwguo, jjyao, shaikhismail, khluu, can-anyscale, bveeramani, jrosti, WeichenXu123, MortalHappiness, raulchen, scottjlee, ruisearch42, aslonnie, alexeykudinkin

2.21.0

Ray Libraries<a id="ray-libraries"></a>

Ray Data<a id="ray-data"></a>

🎉 New features:
- Add `read_lance` API to read Lance Dataset (45106)

🔨 Fixes:
- Retry RaySystemError application errors (45079)

📖 Documentation:
- Fix broken references in data documentation (44956)

Ray Train<a id="ray-train"></a>

📖 Documentation:
- Fix broken links in Train documentation (44953)

Ray Tune<a id="ray-tune"></a>

📖 Documentation:
- Update Hugging Face example to add reference (42771)

🏗 Architecture refactoring:
- Remove deprecated `ray.air.callbacks` modules (45104)

Ray Serve<a id="ray-serve"></a>

💫 Enhancements:
- Allow methods to pass type serve.batch type hint (45004)
- Allow configuring Serve control loop interval (45063)

🔨 Fixes:
- Fix bug with controller failing to recover for autoscaling deployments (45118)
- Fix control+c after serve run doesn't shutdown serve components (45087)
- Fix lightweight update max ongoing requests (45006)

RLlib<a id="rllib"></a>

🎉 New Features:
- New MetricsLogger API now fully functional on the new API stack (working now also inside Learner classes, i.e. loss functions). ([44995](https://github.com/ray-project/ray/pull/44995), [#45109](https://github.com/ray-project/ray/pull/45109))

💫 Enhancements:
- Renamings and cleanups (toward new API stack and more consistent naming schemata): WorkerSet -> EnvRunnerGroup, DEFAULT_POLICY_ID -> DEFAULT_MODULE_ID, config.rollouts() -> config.env_runners(), etc.. ([45022](https://github.com/ray-project/ray/pull/45022), [#44920](https://github.com/ray-project/ray/pull/44920))
- Changed behavior of `EnvRunnerGroup.foreach_worker…` methods to new defaults: `mark_healthy=True` (used to be False) and `healthy_only=True` (used to be False). ([44993](https://github.com/ray-project/ray/pull/44993))
- Fix `get_state()/from_state()` methods in SingleAgent- and MultiAgentEpisodes. ([45012](https://github.com/ray-project/ray/pull/45012))

🔨 Fixes:
- Bug fix for (torch) global_norm clipping overflow problem: ([45055](https://github.com/ray-project/ray/pull/45055))
- Various bug- and test case fixes: [45030](https://github.com/ray-project/ray/pull/45030), [#45031](https://github.com/ray-project/ray/pull/45031), [#45070](https://github.com/ray-project/ray/pull/45070), [#45053](https://github.com/ray-project/ray/pull/45053), [#45110](https://github.com/ray-project/ray/pull/45110), [#45111](https://github.com/ray-project/ray/pull/45111)

📖 Documentation:
- Example scripts using the MetricsLogger for env rendering and recording w/ WandB: [45073](https://github.com/ray-project/ray/pull/45073), [#45107](https://github.com/ray-project/ray/pull/45107)

Ray Core<a id="ray-core"></a>

🔨 Fixes:
- Fix `ray.init(logging_format)` argument is ignored (45037)
- Handle unserializable user exception (44878)
- Fix dashboard process event loop blocking issues (45048, 45047)

Dashboard<a id="dashboard"></a>
🔨 Fixes:
- Fix Nodes page sorting not working correctly.
- Add back “actors per page” UI control in the actors page.

Many thanks to all those who contributed to this release: rynewang, can-anyscale, scottsun94, bveeramani, ceddy4395, GeneDer, zcin, JoshKarpel, nikitavemuri, stephanie-wang, jackhumphries, matthewdeng, yash97, simonsays1980, peytondmurray, evalaiyc98, c21, alanwguo, shrekris-anyscale, kevin85421, hongchaodeng, sven1977, st--, khluu

2.20.0

Ray Libraries<a id="ray-libraries"></a>

Ray Data<a id="ray-data"></a>

💫 Enhancements:
- Dedupe repeated schema during `ParquetDatasource` metadata prefetching (44750)
- Update `map_groups` implementation to better handle large outputs (44862)
- Deprecate `prefetch_batches` arg of `iter_rows` and change default value (44982)
- Adding in default behavior to false for creating dirs on s3 writes (44972)
- Make internal UDF names more descriptive (44985)
- Make `name` a required argument for `AggregateFn` (44880)

📖 Documentation:
- Add key concepts to and revise "Data Internals" page (44751)

Ray Train<a id="ray-train"></a>

💫 Enhancements:
- Setup XGBoost `CommunicatorContext` automatically (44883)
- Track Train Run Info with `TrainStateActor` (44585)

📖 Documentation:
- Add documentation for `accelerator_type` (44882)
- Update Ray Train example titles (44369)

Ray Tune<a id="ray-tune"></a>

💫 Enhancements:
- Remove trial table when running Ray Train in a Jupyter notebook (44858)
- Clean up temporary checkpoint directories for class Trainables (ex: RLlib) (44366)

📖 Documentation:
- Fix minor doc format issues (44865)
- Remove outdated ScalingConfig references (44918)

Ray Serve<a id="ray-serve"></a>

💫 Enhancements:
- Handle push metric interval is now configurable with environment variable RAY_SERVE_HANDLE_METRIC_PUSH_INTERVAL_S (32920)
- Improve performance of developer API serve.get_app_handle (44812)

🔨 Fixes:
- Fix memory leak in handles for autoscaling deployments (the leak happens when
- RAY_SERVE_COLLECT_AUTOSCALING_METRICS_ON_HANDLE=1) (44877)

RLlib<a id="rllib"></a>

🎉 New Features:
- Introduce `MetricsLogger`, a unified API for users of RLlib to log custom metrics and stats in all of RLlib’s components (Algorithm, EnvRunners, and Learners). Rolled out for new API stack for Algorithm (`training_step`) and EnvRunners (custom callbacks). `Learner` (custom loss functions) support in progress. [44888](https://github.com/ray-project/ray/pull/44888), [#44442](https://github.com/ray-project/ray/pull/44442)
- Introduce “inference-only” (slim) mode for RLModules that run inside an EnvRunner (and thus don’t require value-functions or target networks): [44797](https://github.com/ray-project/ray/pull/44797)

💫 Enhancements:
- MultiAgentEpisodeReplayBuffer for new API stack (preparation for multi-agent support of SAC and DQN): [44450](https://github.com/ray-project/ray/pull/44450)
- AlgorithmConfig cleanup and renaming of properties and methods for better consistency/transparency: [44896](https://github.com/ray-project/ray/pull/44896)

🔨 Fixes:
- Various minor bug fixes: [44989](https://github.com/ray-project/ray/pull/44989), [#44988](https://github.com/ray-project/ray/pull/44988), [#44891](https://github.com/ray-project/ray/pull/44891), [#44898](https://github.com/ray-project/ray/pull/44898), [#44868](https://github.com/ray-project/ray/pull/44868), [#44867](https://github.com/ray-project/ray/pull/44867), [#44845](https://github.com/ray-project/ray/pull/44845)

Ray Core and Ray Clusters<a id="ray-core"></a>

💫 Enhancements:
- Report GCS internal pubsub buffer metrics and cap message size (44749)

🔨 Fixes:
- Fix task submission never return when network partition happens (44692)
- Fix incorrect use of ssh port forward option. (44973)
- Make sure dashboard will exit if grpc server fails (44928)
- Make sure dashboard agent will exit if grpc server fails (44899)

Thanks can-anyscale, hongchaodeng, zcin, marwan116, khluu, bewestphal, scottjlee, andrewsykim, anyscalesam, MortalHappiness, justinvyu, JoshKarpel, woshiyyya, rynewang, Abirdcfly, omatthew98, sven1977, marcelocarmona, rueian, mattip, angelinalg, aslonnie, matthewdeng, abizjakpro, simonsays1980, jjyao, terraflops1048576, hongpeng-guo, stephanie-wang, bw-matthew, bveeramani, ruisearch42, kevin85421, Tongruizhe

Many thanks to all those who contributed to this release!

2.12.0

Ray Libraries<a id="ray-libraries"></a>

Ray Data<a id="ray-data"></a>

🎉 New Features:

- Store Ray Data logs in special subdirectory (44743)

💫 Enhancements:
- Add in `local_read` option to `from_torch` (44752)

🔨 Fixes:
- Fix the config to disable progress bar (44342)

📖 Documentation:
- Clarify deprecated Datasource docstrings (44790)

Ray Train<a id="ray-train"></a>

🔨 Fixes:
- Disable gathering the full state dict in `RayFSDPStrategy` for `lightning>2.1` (44569)

Ray Tune<a id="ray-tune"></a>

💫 Enhancements:

- Remove spammy log for "new output engine" (44824)
- Enable isort (44693)

Ray Serve<a id="ray-serve"></a>

🔨 Fixes:
- [Serve] fix getting attributes on stdout during Serve logging redirect ([44787](https://github.com/ray-project/ray/pull/44787))

RLlib<a id="rllib"></a>

🎉 New Features:

- Support of images and video logging in WandB (env rendering example script for the new API stack coming up). ([43356](https://github.com/ray-project/ray/pull/43356))

💫 Enhancements:

- Better support and separation-of-concerns for `model_config_dict` in new API stack. ([44263](https://github.com/ray-project/ray/pull/44263))
- Added example script to pre-train an `RLModule` in single-agent fashion, then bring checkpoint into multi-agent setup and continue training. ([44674](https://github.com/ray-project/ray/pull/44674))
- More `examples` scripts got translated from the old- to the new API stack: Curriculum learning, custom-gym-env, etc..: ([44706](https://github.com/ray-project/ray/pull/44706), [#44707](https://github.com/ray-project/ray/pull/44707), [#44735](https://github.com/ray-project/ray/pull/44735), [#44841](https://github.com/ray-project/ray/pull/44841))

Ray Core and Ray Clusters<a id="ray-core"></a>

🔨 Fixes:
- Fix GetAllJobInfo `is_running_tasks` is not returning the correct value when driver starts ray (44459)

Thanks

Many thanks to all those who contributed to this release!
can-anyscale, hongpeng-guo, sven1977, zcin, shrekris-anyscale, liuxsh9, jackhumphries, GeneDer, woshiyyya, simonsays1980, omatthew98, andrewsykim, n30111, architkulkarni, bveeramani, aslonnie, alexeykudinkin, WeichenXu123, rynewang, matthewdeng, angelinalg, c21

2.11.0

Release Highlights<a id="release-highlights"></a>

- [data] Support reading Avro files with `ray.data.read_avro`
- [train] Added experimental support for AWS Trainium (Neuron) and Intel HPU.

Ray Libraries<a id="ray-libraries"></a>

Ray Data<a id="ray-data"></a>

🎉 New Features:

- Support reading Avro files with `ray.data.read_avro` (43663)

💫 Enhancements:
- Pin `ipywidgets==7.7.2` to enable Data progress bars in VSCode Web (44398)
- Change log level for ignored exceptions (44408)

🔨 Fixes:
- Change Parquet encoding ratio lower bound from 2 to 1 (44470)
- Fix throughput time calculations for metrics (44138)
- Fix nested ragged `numpy.ndarray` (44236)
- Fix Ray debugger incompatibility caused by trimmed error stack trace (44496)

📖 Documentation:
- Update "Data Loading and Preprocessing" doc (44165)
- Move imports into `TFPRedictor` in batch inference example (44434)

Ray Train<a id="ray-train"></a>

🎉 New Features:

- Add experimental support for AWS Trainium (Neuron) (39130)
- Add experimental support for Intel HPU (43343)

💫 Enhancements:

- Log a deprecation warning for local_dir and related environment variables (44029)
- Enforce xgboost>=1.7 for XGBoostTrainer usage (44269)

🔨 Fixes:

- Fix ScalingConfig(accelerator_type) to request an appropriate resource amount (44225)
- Fix maximum recursion issue when serializing exceptions (43952)
- Remove base config deepcopy when initializing the trainer actor (44611)

🏗 Architecture refactoring:

- Remove deprecated `BatchPredictor` (43934)

Ray Tune<a id="ray-tune"></a>

💫 Enhancements:

- Add support for new style lightning import (44339)
- Log a deprecation warning for local_dir and related environment variables (44029)

🏗 Architecture refactoring:

- Remove scikit-optimize search algorithm (43969)

Ray Serve<a id="ray-serve"></a>

🔨 Fixes:
- Dynamically-created applications will no longer be deleted when a config is PUT via the REST API ([44476](https://github.com/ray-project/ray/pull/44476)).
- Fix `_to_object_ref` memory leak ([43763](https://github.com/ray-project/ray/pull/43763))
- Log warning to reconfigure `max_ongoing_requests` if `max_batch_size` is less than `max_ongoing_requests` ([43840](https://github.com/ray-project/ray/pull/43840))
- Deployment fails to start with `ModuleNotFoundError` in Ray 3.10 ([44329](https://github.com/ray-project/ray/issues/44329))
- This was fixed by reverting the original core changes on the `sys.path` behavior. Revert "[core] If there's working_dir, don't set _py_driver_sys_path." ([44435](https://github.com/ray-project/ray/pull/44435))
- The `batch_queue_cls` parameter is removed from the `serve.batch` decorator ([43935](https://github.com/ray-project/ray/pull/43935))

RLlib<a id="rllib"></a>

🎉 New Features:

- New API stack: **DQN Rainbow** is now available for single-agent ([43196](https://github.com/ray-project/ray/pull/43196), [#43198](https://github.com/ray-project/ray/pull/43198), [#43199](https://github.com/ray-project/ray/pull/43199))
- **`PrioritizedEpisodeReplayBuffer`** is available for **off-policy learning using the EnvRunner API** (`SingleAgentEnvRunner`) and supports random n-step sampling ([42832](https://github.com/ray-project/ray/pull/42832), [#43258](https://github.com/ray-project/ray/pull/43258), [#43458](https://github.com/ray-project/ray/pull/43458), [#43496](https://github.com/ray-project/ray/pull/43496), [#44262](https://github.com/ray-project/ray/pull/44262))

💫 Enhancements:

- **Restructured `examples/` folder**; started moving example scripts to the new API stack ([44559](https://github.com/ray-project/ray/pull/44559), [#44067](https://github.com/ray-project/ray/pull/44067), [#44603](https://github.com/ray-project/ray/pull/44603))
- **Evaluation do-over: Deprecate `enable_async_evaluation` option** (in favor of existing `evaluation_parallel_to_training` setting). ([43787](https://github.com/ray-project/ray/pull/43787))
- Add: **`module_for` API to MultiAgentEpisode** (analogous to `policy_for` API of the old Episode classes). ([44241](https://github.com/ray-project/ray/pull/44241))
- All **`rllib_contrib`** old stack algorithms have been removed from `rllib/algorithms` ([43656](https://github.com/ray-project/ray/pull/43656))

🔨 Fixes:

- New API stack: Multi-GPU + multi-agent has been fixed. This completes support for any combinations of the following on the new API stack: [single-agent, multi-agent] vs [0 GPUs, 1 GPU, >1GPUs] vs [any number of EnvRunners] ([44420](https://github.com/ray-project/ray/pull/44420), [#44664](https://github.com/ray-project/ray/pull/44664), [#44594](https://github.com/ray-project/ray/pull/44594), [#44677](https://github.com/ray-project/ray/pull/44677), [#44082](https://github.com/ray-project/ray/pull/44082), [#44669](https://github.com/ray-project/ray/pull/44669), [#44622](https://github.com/ray-project/ray/pull/44622))
- Various other bug fixes: [43906](https://github.com/ray-project/ray/pull/43906), [#43871](https://github.com/ray-project/ray/pull/43871), [#44000](https://github.com/ray-project/ray/pull/44000), [#44340](https://github.com/ray-project/ray/pull/44340), [#44491](https://github.com/ray-project/ray/pull/44491), [#43959](https://github.com/ray-project/ray/pull/43959), [#44043](https://github.com/ray-project/ray/pull/44043), [#44446](https://github.com/ray-project/ray/pull/44446), [#44040](https://github.com/ray-project/ray/pull/44040)

📖 Documentation:
- [Re-announced new API stack in alpha stage](https://docs.ray.io/en/master/rllib/rllib-new-api-stack.html) ([#44090](https://github.com/ray-project/ray/pull/44090)).

Ray Core and Ray Clusters<a id="ray-core"></a>

🎉 New Features:

- Added Ray check-open-ports CLI for checking potential open ports to the public (44488)

💫 Enhancements:

- Support nodes sharing the same spilling directory without conflicts. (44487)
- Create two subclasses of `RayActorError` to distinguish between actor died (`ActorDiedError`) and actor temporarily unavailable ([`ActorUnavailableError`](https://docs.ray.io/en/master/ray-core/fault_tolerance/actors.html#unavailable-actors)) cases.

🔨 Fixes:

- Fixed the `ModuleNotFound` issued introduced in 2.10 (44435)
- Fixed an issue where agent process is using too much CPU (44348)
- Fixed race condition in multi-threaded actor creation (44232)
- Fixed several streaming generator bugs (44079, 44257, 44197)
- Fixed an issue where user exception raised from tasks cannot be subclassed (44379)

Dashboard <a id="dashboard"></a>

💫 Enhancements:

- Add serve controller metrics to serve system dashboard page (43797)
- Add Serve Application rows to Serve top-level deployments details page (43506)
- [Actor table page enhancements] Include "NodeId", "CPU", "Memory", "GPU", "GRAM" columns in the actor table page. Add sort functionality to resource utilization columns. Enable searching table by "Class" and "Repr". (42588) (42633) (42788)

🔨 Fixes:

- Fix default sorting of nodes in Cluster table page to first be by "Alive" nodes, then head nodes, then alphabetical by node ID. (42929)
- Fix bug where the Serve Deployment detail page fails to load if the deployment is in "Starting" state (43279)

Docs <a id="docs"></a>

💫 Enhancements:

- Landing page refreshes its look and feel. (44251)

Thanks

Many thanks to all those who contributed to this release!

aslonnie, brycehuang30, MortalHappiness, astron8t-voyagerx, edoakes, sven1977, anyscalesam, scottjlee, hongchaodeng, slfan1989, hebiao064, fishbone, zcin, GeneDer, shrekris-anyscale, kira-lin, chappidim, raulchen, c21, WeichenXu123, marian-code, bveeramani, can-anyscale, mjd3, justinvyu, jackhumphries, Bye-legumes, ashione, alanwguo, Dreamsorcerer, KamenShah, jjyao, omatthew98, autolisis, Superskyyy, stephanie-wang, simonsays1980, davidxia, angelinalg, architkulkarni, chris-ray-zhang, kevin85421, rynewang, peytondmurray, zhangyilun, khluu, matthewdeng, ruisearch42, pcmoritz, mattip, jerome-habana, alexeykudinkin

2.10.0

Not secure

Release Highlights<a id="release-highlights"></a>
Ray 2.10 release brings important stability improvements and enhancements to Ray Data, with Ray Data becoming generally available (GA).

- [Data] Ray Data becomes generally available with stability improvements in streaming execution, reading and writing data, better tasks concurrency control, and debuggability improvement with dashboard, logging and metrics visualization.
- [RLlib] “**New API Stack**” officially announced as alpha for PPO and SAC.
- [Serve] Added a default autoscaling policy set via `num_replicas=”auto”` ([42613](https://github.com/ray-project/ray/issues/42613)).
- [Serve] Added support for active load shedding via `max_queued_requests` ([42950](https://github.com/ray-project/ray/issues/42950)).
- [Serve] Added replica queue length caching to the DeploymentHandle scheduler ([42943](https://github.com/ray-project/ray/pull/42943)).
- This should improve overhead in the Serve proxy and handles.
- `max_ongoing_requests (max_concurrent_queries)` is also now strictly enforced ([42947](https://github.com/ray-project/ray/issues/42947)).
- If you see any issues, please report them on GitHub and you can disable this behavior by setting: `RAY_SERVE_ENABLE_QUEUE_LENGTH_CACHE=0`.
- [Serve] Renamed the following parameters. Each of the old names will be supported for another release before removal.
- `max_concurrent_queries` -> `max_ongoing_requests`
- `target_num_ongoing_requests_per_replica` -> `target_ongoing_requests`
- `downscale_smoothing_factor` -> `downscaling_factor`
- `upscale_smoothing_factor` -> `upscaling_factor`
- [Serve] **WARNING**: the following default values will change in Ray 2.11:
- Default for `max_ongoing_requests` will change from 100 to 5.
- Default for `target_ongoing_requests` will change from 1 to 2.
- [Core] [Autoscaler v2](https://docs.ray.io/en/master/cluster/kubernetes/user-guides/configuring-autoscaling.html#kuberay-autoscaler-v2) is in alpha and can be tried out with Kuberay. It has improved observability and stability compared to v1.
- [Train] Added support for accelerator types via `ScalingConfig(accelerator_type)`.
- [Train] Revamped the `XGBoostTrainer` and `LightGBMTrainer` to no longer depend on `xgboost_ray` and `lightgbm_ray`. A new, more flexible API will be released in a future release.
- [Train/Tune] Refactored local staging directory to remove the need for `local_dir` and `RAY_AIR_LOCAL_CACHE_DIR`.

Ray Libraries<a id="ray-libraries"></a>

Ray Data<a id="ray-data"></a>

🎉 New Features:
- Streaming execution stability improvement to avoid memory issue, including per-operator resource reservation, streaming generator output buffer management, and better runtime resource estimation (43026, 43171, 43298, 43299, 42930, 42504)
- Metadata read stability improvement to avoid AWS transient error, including retry on application-level exception, spread tasks across multiple nodes, and configure retry interval (42044, 43216, 42922, 42759).
- Allow tasks concurrency control for read, map, and write APIs (42849, 43113, 43177, 42637)
- Data dashboard and statistics improvement with more runtime metrics for each components (43790, 43628, 43241, 43477, 43110, 43112)
- Allow to specify application-level error to retry for actor task (42492)
- Add `num_rows_per_file` parameter to file-based writes (42694)
- Add `DataIterator.materialize` (43210)
- Skip schema call in `DataIterator.to_tf` if `tf.TypeSpec` is provided (42917)
- Add option to append for `Dataset.write_bigquery` (42584)
- Deprecate legacy components and classes (43575, 43178, 43347, 43349, 43342, 43341, 42936, 43144, 43022, 43023)

💫 Enhancements:

- Restructure stdout logging for better readability (43360)
- Add a more performant way to read large TFRecord datasets (42277)
- Modify `ImageDatasource` to use `Image.BILINEAR` as the default image resampling filter (43484)
- Reduce internal stack trace output by default (43251)
- Perform incremental writes to Parquet files (43563)
- Warn on excessive driver memory usage during shuffle ops (42574)
- Distributed reads for `ray.data.from_huggingface` (42599)
- Remove `Stage` class and related usages (42685)
- Improve stability of reading JSON files to avoid PyArrow errors (42558, 42357)

🔨 Fixes:

- Turn off actor locality by default (44124)
- Normalize block types before internal multi-block operations (43764)
- Fix memory metrics for `OutputSplitter` (43740)
- Fix race condition issue in `OpBufferQueue` (43015)
- Fix early stop for multiple `Limit` operators. (42958)
- Fix deadlocks caused by `Dataset.streaming_split` for job hanging (42601)

📖 Documentation:

- Revamp Ray Data documentation for GA (44006, 44007, 44008, 44098, 44168, 44093, 44105)

Ray Train<a id="ray-train"></a>

🎉 New Features:

- Add support for accelerator types via `ScalingConfig(accelerator_type)` for improved worker scheduling (43090)

💫 Enhancements:

- Add a backend-specific context manager for `train_func` for setup/teardown logic (43209)
- Remove `DEFAULT_NCCL_SOCKET_IFNAME` to simplify network configuration (42808)
- Colocate Trainer with rank 0 Worker for to improve scheduling behavior (43115)

🔨 Fixes:

- Enable scheduling workers with `memory` resource requirements (42999)
- Make path behavior OS-agnostic by using `Path.as_posix` over `os.path.join` (42037)
- [Lightning] Fix resuming from checkpoint when using `RayFSDPStrategy` (43594)
- [Lightning] Fix deadlock in `RayTrainReportCallback` (42751)
- [Transformers] Fix checkpoint reporting behavior when `get_latest_checkpoint` returns None (42953)

📖 Documentation:

- Enhance docstring and user guides for `train_loop_config` (43691)
- Clarify in `ray.train.report` docstring that it is not a barrier (42422)
- Improve documentation for `prepare_data_loader` shuffle behavior and `set_epoch` (41807)

🏗 Architecture refactoring:

- Simplify XGBoost and LightGBM Trainer integrations. Implemented `XGBoostTrainer` and `LightGBMTrainer` as `DataParallelTrainer`. Removed dependency on `xgboost_ray` and `lightgbm_ray`. (42111, 42767, 43244, 43424)
- Refactor local staging directory to remove the need for `local_dir` and `RAY_AIR_LOCAL_CACHE_DIR`. Add isolation between driver and distributed worker artifacts so that large files written by workers are not uploaded implicitly. Results are now only written to `storage_path`, rather than having another copy in the user’s home directory (`~/ray_results`). (43369, 43403, 43689)
- Split overloaded `ray.train.torch.get_device` into another `get_devices` API for multi-GPU worker setup (42314)
- Refactor restoration configuration to be centered around `storage_path` (42853, 43179)
- Deprecations related to `SyncConfig` (42909)
- Remove deprecated `preprocessor` argument from Trainers (43146, 43234)
- Hard-deprecate `MosaicTrainer` and remove `SklearnTrainer` (42814)

Ray Tune<a id="ray-tune"></a>

💫 Enhancements:

- Increase the minimum number of allowed pending trials for faster auto-scaleup (43455)
- Add support to `TBXLogger` for logging images (37822)
- Improve validation of `Experiment(config)` to handle RLlib `AlgorithmConfig` (42816, 42116)

🔨 Fixes:

- Fix `reuse_actors` error on actor cleanup for function trainables (42951)
- Make path behavior OS-agnostic by using Path.as_posix over `os.path.join` (42037)

📖 Documentation:

- Minor documentation fixes (42118, 41982)

🏗 Architecture refactoring:

- Refactor local staging directory to remove the need for `local_dir` and `RAY_AIR_LOCAL_CACHE_DIR`. Add isolation between driver and distributed worker artifacts so that large files written by workers are not uploaded implicitly. Results are now only written to `storage_path`, rather than having another copy in the user’s home directory (`~/ray_results`). (43369, 43403, 43689)
- Deprecations related to `SyncConfig` and `chdir_to_trial_dir` (42909)
- Refactor restoration configuration to be centered around `storage_path` (42853, 43179)
- Add back `NevergradSearch` (42305)
- Clean up invalid `checkpoint_dir` and `reporter` deprecation notices (42698)

Ray Serve<a id="ray-serve"></a>

🎉 New Features:

- Added support for active load shedding via `max_queued_requests` ([42950](https://github.com/ray-project/ray/issues/42950)).
- Added a default autoscaling policy set via `num_replicas=”auto”` ([42613](https://github.com/ray-project/ray/issues/42613)).

🏗 API Changes:

- Renamed the following parameters. Each of the old names will be supported for another release before removal.
- `max_concurrent_queries` to `max_ongoing_requests`
- `target_num_ongoing_requests_per_replica` to `target_ongoing_requests`
- `downscale_smoothing_factor` to `downscaling_factor`
- `upscale_smoothing_factor` to `upscaling_factor`
- **WARNING**: the following default values will change in Ray 2.11:
- Default for `max_ongoing_requests` will change from 100 to 5.
- Default for `target_ongoing_requests` will change from 1 to 2.

💫 Enhancements:

- Add `RAY_SERVE_LOG_ENCODING` env to set the global logging behavior for Serve ([42781](https://github.com/ray-project/ray/pull/42781)).
- Config Serve's gRPC proxy to allow large payload ([43114](https://github.com/ray-project/ray/pull/43114)).
- Add blocking flag to serve.run() ([43227](https://github.com/ray-project/ray/pull/43227)).
- Add actor id and worker id to Serve structured logs ([43725](https://github.com/ray-project/ray/pull/43725)).
- Added replica queue length caching to the DeploymentHandle scheduler ([42943](https://github.com/ray-project/ray/pull/42943)).
- This should improve overhead in the Serve proxy and handles.
- `max_ongoing_requests` (`max_concurrent_queries`) is also now strictly enforced ([42947](https://github.com/ray-project/ray/issues/42947)).
- If you see any issues, please report them on GitHub and you can disable this behavior by setting: `RAY_SERVE_ENABLE_QUEUE_LENGTH_CACHE=0`.
- Autoscaling metrics (tracking ongoing and queued metrics) are now collected at deployment handles by default instead of at the Serve replicas ([42578](https://github.com/ray-project/ray/pull/42578)).
- This means you can now set `max_ongoing_requests=1` for autoscaling deployments and still upscale properly, because requests queued at handles are properly taken into account for autoscaling.
- You should expect deployments to upscale more aggressively during bursty traffic, because requests will likely queue up at handles during bursts of traffic.
- If you see any issues, please report them on GitHub and you can switch back to the old method of collecting metrics by setting the environment variable `RAY_SERVE_COLLECT_AUTOSCALING_METRICS_ON_HANDLE=0`
- Improved the downscaling behavior of smoothing_factor for low numbers of replicas ([42612](https://github.com/ray-project/ray/issues/42612)).
- Various logging improvements ([43707](https://github.com/ray-project/ray/pull/43707), [#43708](https://github.com/ray-project/ray/pull/43708), [#43629](https://github.com/ray-project/ray/pull/43629), [#43557](https://github.com/ray-project/ray/pull/43557)).
- During in-place upgrades or when replicas become unhealthy, Serve will no longer wait for old replicas to gracefully terminate before starting new ones ([43187](https://github.com/ray-project/ray/pull/43187)). New replicas will be eagerly started to satisfy the target number of healthy replicas.
- This new behavior is on by default and can be turned off by setting `RAY_SERVE_EAGERLY_START_REPLACEMENT_REPLICAS=0`

🔨 Fixes:

- Fix deployment route prefix override by default route prefix from serve run cli ([43805](https://github.com/ray-project/ray/pull/43805)).
- Fixed a bug causing batch methods to hang upon cancellation ([42593](https://github.com/ray-project/ray/issues/42593)).
- Unpinned FastAPI dependency version ([42711](https://github.com/ray-project/ray/issues/42711)).
- Delay proxy marking itself as healthy until it has routes from the controller ([43076](https://github.com/ray-project/ray/issues/43076)).
- Fixed an issue where multiplexed deployments could go into infinite backoff ([43965](https://github.com/ray-project/ray/issues/43965)).
- Silence noisy `KeyError` on disconnects ([43713](https://github.com/ray-project/ray/pull/43713)).
- Fixed the prometheus counter metrics emitted as gauge bug ([43795](https://github.com/ray-project/ray/pull/43795), [#43901](https://github.com/ray-project/ray/pull/43901)).
- All the serve counter metrics are emitted as counters with _total suffix. The old gauge metrics are still emitted for compatibility.

📖 Documentation:

- Update serve logging config docs ([43483](https://github.com/ray-project/ray/pull/43483)).
- Added documentation for `max_replicas_per_node` ([42743](https://github.com/ray-project/ray/pull/42743)).

RLlib<a id="rllib"></a>

🎉 New Features:

- The **“new API stack”** is now in alpha stage and available for **PPO single-** (42272) and **multi-agent** and for **SAC single-agent** ([42571](https://github.com/ray-project/ray/pull/42571), [#42570](https://github.com/ray-project/ray/pull/42570), [#42568](https://github.com/ray-project/ray/pull/42568))
- **ConnectorV2 API** ([43669](https://github.com/ray-project/ray/pull/43669), [#43680](https://github.com/ray-project/ray/pull/43680), [#43040](https://github.com/ray-project/ray/pull/43040), [#41074](https://github.com/ray-project/ray/pull/41074), [#41212](https://github.com/ray-project/ray/pull/41212))
- **Episode APIs** (SingleAgentEpisode and MultiAgentEpisode) ([42009](https://github.com/ray-project/ray/pull/42009), [#43275](https://github.com/ray-project/ray/pull/43275), [#42296](https://github.com/ray-project/ray/pull/42296), [#43818](https://github.com/ray-project/ray/pull/43818), [#41631](https://github.com/ray-project/ray/pull/41631))
- **EnvRunner APIs** (SingleAgentEnvRunner and MultiAgentEnvRunner) ([41558](https://github.com/ray-project/ray/pull/41558), [#41825](https://github.com/ray-project/ray/pull/41825), [#42296](https://github.com/ray-project/ray/pull/42296), [#43779](https://github.com/ray-project/ray/pull/43779))
- In preparation of **DQN** on the new API stack: PrioritizedEpisodeReplayBuffer ([43258](https://github.com/ray-project/ray/pull/43258), [#42832](https://github.com/ray-project/ray/pull/42832))

💫 Enhancements:

- **Old API Stack cleanups:**
- Move `SampleBatch` column names (e.g. `SampleBatch.OBS`) into new class (`Columns`). ([43665](https://github.com/ray-project/ray/pull/43665))
- Remove old exec_plan API code. ([41585](https://github.com/ray-project/ray/pull/41585))
- Introduce `OldAPIStack` decorator ([43657](https://github.com/ray-project/ray/pull/43657))
- **RLModule API:** Add functionality to define kernel and bias initializers via config. ([42137](https://github.com/ray-project/ray/pull/42137))
- **Learner/LearnerGroup APIs**:
- Replace Learner/LearnerGroup specific config classes (e.g. `LearnerHyperparameters`) with `AlgorithmConfig`. ([41296](https://github.com/ray-project/ray/pull/41296))
- Learner/LearnerGroup: Allow updating from Episodes. ([41235](https://github.com/ray-project/ray/pull/41235))
- In preparation of **DQN** on the new API stack: ([43199](https://github.com/ray-project/ray/pull/43199), [#43196](https://github.com/ray-project/ray/pull/43196))

🔨 Fixes:

- New API Stack bug fixes: Fix `policy_to_train` logic ([41529](https://github.com/ray-project/ray/pull/41529)), fix multi-APU for PPO on the new API stack. ([#44001](https://github.com/ray-project/ray/pull/44001)), Issue 40347: ([#42090](https://github.com/ray-project/ray/pull/42090))
- Other fixes: MultiAgentEnv would NOT call env.close() on a failed sub-env ([43664](https://github.com/ray-project/ray/pull/43664)), Issue 42152 ([#43317](https://github.com/ray-project/ray/pull/43317)), issue 42396: ([#43316](https://github.com/ray-project/ray/pull/43316)), issue 41518 ([#42011](https://github.com/ray-project/ray/pull/42011)), issue 42385 ([#43313](https://github.com/ray-project/ray/pull/43313))

📖 Documentation:

- New API Stack examples: Self-play and league-based self-play ([43276](https://github.com/ray-project/ray/pull/43276)), MeanStdFilter (for both single-agent and multi-agent) ([#43274](https://github.com/ray-project/ray/pull/43274)), Prev-actions/prev-rewards for multi-agent ([#43491](https://github.com/ray-project/ray/pull/43491))
- Other docs fixes and enhancements: ([43438](https://github.com/ray-project/ray/pull/43438), [#41472](https://github.com/ray-project/ray/pull/41472), [#42117](https://github.com/ray-project/ray/pull/42177), [#43458](https://github.com/ray-project/ray/pull/43458))

Ray Core and Ray Clusters<a id="ray-core-and-ray-clusters"></a>

Ray Core<a id="ray-core"></a>

🎉 New Features:

- [Autoscaler v2](https://docs.ray.io/en/master/cluster/kubernetes/user-guides/configuring-autoscaling.html#kuberay-autoscaler-v2) is in alpha and can be tried out with Kuberay.
- Introduced [subreaper](https://docs.ray.io/en/master/ray-core/user-spawn-processes.html) to prevent leaks of sub-processes created by user code. (#42992)

💫 Enhancements:

- Ray state api `get_task()` now accepts ObjectRef (43507)
- Add an option to disable task tracing for task/actor (42431)
- Improved object transfer throughput. (43434)
- Ray client now compares the Ray and Python version for compatibility with the remote Ray cluster. (42760)

🔨 Fixes:

- Fixed several bugs for streaming generator (43775, 43772, 43413)
- Fixed Ray counter metrics emitted as gauge bug (43795)
- Fixed a bug where empty resource task doesn’t work with placement group (43448)
- Fixed a bug where CPU resource is not released for a blocked worker inside placement group (43270)
- Fixed GCS crashes when PG commit phase failed due to node failure (43405)
- Fixed a bug where Ray memory monitor prematurely kill tasks (43071)
- Fixed placement group resource leak (42942)
- Upgraded cloudpickle to 3.0 which fixes the incompatibility with dataclasses (42730)

📖 Documentation:

- Updated the doc for Ray accelerators support (41849)

Ray Clusters<a id="ray-clusters"></a>

💫 Enhancements:

- [spark] Add `heap_memory` param for `setup_ray_cluster` API, and change default value of per ray worker node config, and change default value of ray head node config for global Ray cluster (42604)
- [spark] Add global mode for ray on spark cluster (41153)

🔨 Fixes:

- [VSphere] Only deploy ovf to first host of cluster (42258)

Thanks

Many thanks to all those who contributed to this release!

ronyw7, xsqian, justinvyu, matthewdeng, sven1977, thomasdesr, veryhannibal, klebster2, can-anyscale, simran-2797, stephanie-wang, simonsays1980, kouroshHakha, Zandew, akshay-anyscale, matschaffer-roblox, WeichenXu123, matthew29tang, vitsai, Hank0626, anmyachev, kira-lin, ericl, zcin, sihanwang41, peytondmurray, raulchen, aslonnie, ruisearch42, vszal, pcmoritz, rickyyx, chrislevn, brycehuang30, alexeykudinkin, vonsago, shrekris-anyscale, andrewsykim, c21, mattip, hongchaodeng, dabauxi, fishbone, scottjlee, justina777, surenyufuz, robertnishihara, nikitavemuri, Yard1, huchen2021, shomilj, architkulkarni, liuxsh9, Jocn2020, liuyang-my, rkooo567, alanwguo, KPostOffice, woshiyyya, n30111, edoakes, y-abe, martinbomio, jiwq, arunppsg, ArturNiederfahrenhorst, kevin85421, khluu, JingChen23, masariello, angelinalg, jjyao, omatthew98, jonathan-anyscale, sjoshi6, gaborgsomogyi, rynewang, ratnopamc, chris-ray-zhang, ijrsvt, scottsun94, raychen911, franklsf95, GeneDer, madhuri-rai07, scv119, bveeramani, anyscalesam, zen-xu, npuichigo

Page 1 of 15

Releases

Has known vulnerabilities

Ray

Page 1 of 15

2.22.0

2.21.0

2.20.0

2.12.0

2.11.0

2.10.0

Page 1 of 15

Links

Releases