Ray

Latest version: v2.42.1

Safety actively analyzes 706567 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 7 of 17

2.3.1

Not secure
The Ray 2.3.1 patch release contains fixes for multiple components:

Ray Data Processing
* Support different number of blocks/rows per block in `zip()` (https://github.com/ray-project/ray/pull/32795)

Ray Serve
* Revert `serve run` to use Ray Client instead of Ray Jobs (https://github.com/ray-project/ray/pull/32976)
* Fix issue with `max_concurrent_queries` being ignored when autoscaling (https://github.com/ray-project/ray/pull/32772 and https://github.com/ray-project/ray/pull/33022)
Ray Core
* Write Ray address even if Ray node is started with `--block` (https://github.com/ray-project/ray/pull/32961)
* Fix Ray on Spark running on layered virtualenv python environment (https://github.com/ray-project/ray/pull/32996)
Dashboard
* Fix disk metric showing double the actual value (https://github.com/ray-project/ray/pull/32674)

2.3.0

Not secure
Release Highlights



* The streaming backend for Ray Datasets is in Developer Preview. It is designed to enable terabyte-scale ML inference and training workloads. Please contact us if you'd like to try it out on your workload, or you can find the preview guide here: https://docs.google.com/document/d/1BXd1cGexDnqHAIVoxTnV3BV0sklO9UXqPwSdHukExhY/edit
* New Information Architecture (**Beta**): We’ve restructured the [Ray dashboard](https://docs.ray.io/en/master/ray-core/ray-dashboard.html) to be organized around user personas and workflows instead of entities.
* Ray-on-Spark is now available (Preview)!: You can launch Ray clusters on Databricks and Spark clusters and run Ray applications. Check out the [documentation](https://docs.ray.io/en/releases-2.3.0/cluster/vms/user-guides/community/spark.html) to learn more.

Ray Libraries


Ray AIR

💫Enhancements:



* Add `set_preprocessor` method to `Checkpoint` (31721)
* Rename Keras callback and its parameters to be more descriptive (31627)
* Deprecate MlflowTrainableMixin in favor of setup_mlflow() function (31295)
* W&B
* Have train_loop_config logged as a config (31901)
* Allow users to exclude config values with WandbLoggerCallback (31624)
* Rename WandB `save_checkpoints` to `upload_checkpoints` (31582)
* Add hook to get project/group for W&B integration (31035, 31643)
* Use Ray actors instead of multiprocessing for WandbLoggerCallback (30847)
* Update `WandbLoggerCallback` example (31625)
* Predictor
* Place predictor kwargs in object store (30932)
* Delegate BatchPredictor stage fusion to Datasets (31585)
* Rename `DLPredictor.call_model` `tensor` parameter to `inputs` (30574)
* Add `use_gpu` to `HuggingFacePredictor` (30945)
* Checkpoints
* Various `Checkpoint` improvements (30948)
* Implement lazy checkpointing for same-node case (29824)
* Automatically strip "module." from state dict (30705)
* Allow user to pass model to `TensorflowCheckpoint.get_model` (31203)

🔨 Fixes:



* Fix and improve support for HDFS remote storage. (31940)
* Use specified Preprocessor configs when using stream API. (31725)
* Support nested Chain in BatchPredictor (31407)

📖Documentation:



* Restructure API References (32535)
* API Deprecations (31777, 31867)
* Various fixes to docstrings, documentation, and examples (30782, 30791)

🏗 Architecture refactoring:



* Use NodeAffinitySchedulingPolicy for scheduling (32016)
* Internal resource management refactor (30777, 30016)


Ray Data Processing

🎉 New Features:



* Lazy execution by default (31286)
* Introduce streaming execution backend (31579)
* Introduce DatasetIterator (31470)
* Add per-epoch preprocessor (31739)
* Add TorchVisionPreprocessor (30578)
* Persist Dataset statistics automatically to log file (30557)

💫Enhancements:



* Async batch fetching for map_batches (31576)
* Add informative progress bar names to map_batches (31526)
* Provide an size bytes estimate for mongodb block (31930)
* Add support for dynamic block splitting to actor pool (31715)
* Improve str/repr of Dataset to include execution plan (31604)
* Deal with nested Chain in BatchPredictor (31407)
* Allow MultiHotEncoder to encode arrays (31365)
* Allow specify batch_size when reading Parquet file (31165)
* Add zero-copy batch API for `ds.map_batches()` (30000)
* Text dataset should save texts in ArrowTable format (30963)
* Return ndarray dicts for single-column tabular datasets (30448)
* Execute randomize_block_order eagerly if it's the last stage for ds.schema() (30804)

🔨 Fixes:



* Don't drop first dataset when peeking DatasetPipeline (31513)
* Handle np.array(dtype=object) constructor for ragged ndarrays (31670)
* Emit warning when starting Dataset execution with no CPU resources available (31574)
* Fix the bug of eagerly clearing up input blocks (31459)
* Fix Imputer failing with categorical dtype (31435)
* Fix schema unification for Datasets with ragged Arrow arrays (31076)
* Fix Discretizers transforming ignored cols (31404)
* Fix to_tf when the input feature_columns is a list. (31228)
* Raise error message if user calls Dataset.__iter__ (30575)

📖Documentation:



* Refactor Ray Data API documentation (31204)
* Add seealso to map-related methods (30579)


Ray Train

🎉 New Features:



* Add option for per-epoch preprocessor (31739)

💫Enhancements:



* Change default `NCCL_SOCKET_IFNAME` to blacklist `veth` (31824)
* Introduce DatasetIterator for bulk and streaming ingest (31470)
* Clarify which `RunConfig` is used when there are multiple places to specify it (31959)
* Change `ScalingConfig` to be optional for `DataParallelTrainer`s if already in Tuner `param_space` (30920)

🔨 Fixes:



* Use specified `Preprocessor` configs when using stream API. (31725)
* Fix off-by-one AIR Trainer checkpoint ID indexing on restore (31423)
* Force GBDTTrainer to use distributed loading for Ray Datasets (31079)
* Fix bad case in ScalingConfig->RayParams (30977)
* Don't raise TuneError on `fail_fast="raise"` (30817)
* Report only once in `SklearnTrainer` (30593)
* Ensure GBDT PGFs match passed ScalingConfig (30470)

📖Documentation:



* Restructure API References (32535)
* Remove Ray Client references from Train docs/examples (32321)
* Various fixes to docstrings, documentation, and examples (29463, 30492, 30543, 30571, 30782, 31692, 31735)

🏗 Architecture refactoring:



* API Deprecations (31763)


Ray Tune

💫Enhancements:



* Improve trainable serialization error (31070)
* Add support for Nevergrad optimizer with extra parameters (31015)
* Add timeout for experiment checkpoint syncing to cloud (30855)
* Move `validate_upload_dir` to Syncer (30869)
* Enable experiment restore from moved cloud uri (31669)
* Save and restore stateful callbacks as part of experiment checkpoint (31957)

🔨 Fixes:



* Do not default to reuse_actors=True when mixins are used (31999)
* Only keep cached actors if search has not ended (31974)
* Fix best trial in ProgressReporter with nan (31276)
* Make ResultGrid return cloud checkpoints (31437)
* Wait for final experiment checkpoint sync to finish (31131)
* Fix CheckpointConfig validation for function trainables (31255)
* Fix checkpoint directory assignment for new checkpoints created after restoring a function trainable (31231)
* Fix `AxSearch` save and nan/inf result handling (31147)
* Fix `AxSearch` search space conversion for fixed list hyperparameters (31088)
* Restore searcher and scheduler properly on `Tuner.restore` (30893)
* Fix progress reporter `sort_by_metric` with nested metrics (30906)
* Don't raise TuneError on `fail_fast="raise"` (30817)
* Fix duplicate printing when trial is done (30597)

📖Documentation:



* Restructure API references (32449)
* Remove Ray Client references from Tune docs/examples (32321)
* Various fixes to docstrings, documentation, and examples (29581, 30782, 30571, 31045, 31793, 32505)

🏗 Architecture refactoring:



* Deprecate passing a custom trial executor (31792)
* Move signal handling into separate method (31004)
* Update staged resources in a fixed counter for faster lookup (32087)
* Rename `overwrite_trainable` argument in Tuner restore to `trainable` (32059)


Ray Serve

🎉 New Features:



* Serve python API to support multi application (31589)

💫Enhancements:



* Add exponential backoff when retrying replicas (31436)
* Enable Log Rotation on Serve (31844)
* Use tasks/futures for asyncio.wait (31608)
* Change target_num_ongoing_requests_per_replica to positive float (31378)

🔨 Fixes:



* Upgrade deprecated calls (31839)
* Change Gradio integration to take a builder function to avoid serialization issues (31619)
* Add initial health check before marking a replica as RUNNING (31189)

📖Documentation:



* Document end-to-end timeout in Serve (31769)
* Document Gradio visualization (28310)


RLlib

🎉 New Features:



* Gymnasium is now supported. ([Notes](https://docs.google.com/document/u/0/d/1lxYK1dI5s0Wo_jmB6V6XiP-_aEBsXDykXkD1AXRase4/edit))
* Connectors are now activated by default (31693, 30388, 31618, 31444, 31092)
* Contribution of LeelaChessZero algorithm for playing chess in a MultiAgent env. (31480)

💫Enhancements:



* [RLlib] Error out if action_dict is empty in MultiAgentEnv. (32129)
* [RLlib] Upgrade tf eager code to no longer use `experimental_relax_shapes` (but `reduce_retracing` instead). (29214)
* [RLlib] Reduce SampleBatch counting complexity (30936)
* [RLlib] Use PyTorch vectorized max() and sum() in SampleBatch.__init__ when possible (28388)
* [RLlib] Support multi-gpu CQL for torch (tf already supported). (31466)
* [RLlib] Introduce IMPALA off_policyness test with GPU (31485)
* [RLlib] Properly serialize and restore StateBufferConnector states for policy stashing (31372)
* [RLlib] Clean up deprecated concat_samples calls (31391)
* [RLlib] Better support MultiBinary spaces by treating Tuples as superset of them in ComplexInputNet. (28900)
* [RLlib] Add backward compatibility to MeanStdFilter to restore from older checkpoints. (30439)
* [RLlib] Clean up some signatures for compute_actions. (31241)
* [RLlib] Simplify logging configuration. (30863)
* [RLlib] Remove native Keras Models. (30986)
* [RLlib] Convert PolicySpec to a readable format when converting to_dict(). (31146)
* [RLlib] Issue 30394: Add proper `__str__()` method to PolicyMap. (31098)
* [RLlib] Issue 30840: Option to only checkpoint policies that are trainable. (31133)
* [RLlib] Deprecate (delete) `contrib` folder. (30992)
* [RLlib] Better behavior if user does not specify stopping condition in RLLib CLI. (31078)
* [RLlib] PolicyMap LRU cache enhancements: Swap out policies (instead of GC'ing and recreating) + use Ray object store (instead of file system). (29513)
* [RLlib] `AlgorithmConfig.overrides()` to replace `multiagent->policies->config` and `evaluation_config` dicts. (30879)
* [RLlib] `deprecation_warning(.., error=True)` should raise `ValueError`, not `DeprecationWarning`. (30255)
* [RLlib] Add `gym.spaces.Text` serialization. (30794)
* [RLlib] Convert `MultiAgentBatch` to `SampleBatch` in offline_rl.py. (30668)
* [RLlib; Tune] Make `Algorithm.train()` return Tune-style config dict (instead of AlgorithmConfig object). (30591)

🔨 Fixes:



* [RLlib] Fix waterworld example and test (32117)
* [RLlib] Change Waterworld v3 to v4 and reinstate indep. MARL test case w/ pettingzoo. (31820)
* [RLlib] Fix OPE checkpointing. Save method name in configuration dict. (31778)
* [RLlib] Fix worker state restoration. (31644)
* [RLlib] Replace ordinary pygame imports by `try_import_..()`. (31332)
* [RLlib] Remove crude VR checks in agent collector. (31564)
* [RLlib] Fixed the 'RestoreWeightsCallback' example script. (31601)
* [RLlib] Issue 28428: QMix not working w/ GPUs. (31299)
* [RLlib] Fix using yaml files with empty stopping conditions. (31363)
* [RLlib] Issue 31174: Move all checks into AlgorithmConfig.validate() (even simple ones) to avoid errors when using tune hyperopt objects. (31396)
* [RLlib] Fix `tensorflow_probability` imports. (31331)
* [RLlib] Issue 31323: BC/MARWIL/CQL do work with multi-GPU (but config validation prevents them from running in this mode). (31393)
* [RLlib] Issue 28849: DT fails with num_gpus=1. (31297)
* [RLlib] Fix `PolicyMap.__del__()` to also remove a deleted policy ID from the internal deque. (31388)
* [RLlib] Use `get_model_v2()` instead of `get_model()` with MADDPG. (30905)
* [RLlib] Policy mapping fn can not be called with keyword arguments. (31141)
* [RLlib] Issue 30213: Appending RolloutMetrics to sampler outputs should happen after(!) all callbacks (such that custom metrics for last obs are still included). (31102)
* [RLlib] Make convert_to_torch tensor adhere to docstring. (31095)
* [RLlib] Fix convert to torch tensor (31023)
* [RLlib] Issue 30221: random policy does not handle nested spaces. (31025)
* [RLlib] Fix crashing remote envs example (30562)
* [RLlib] Recursively look up the original space from obs_space (30602)

📖Documentation:



* [RLlib; docs] Change links and references in code and docs to "Farama foundation's gymnasium" (from "OpenAI gym"). (32061)


Ray Core and Ray Clusters


Ray Core

🎉 New Features:



* Task Events Backend: Ray aggregates all submitted task information to provide better observability (31840, 31761, 31278, 31247, 31316, 30934, 30979, 31207, 30867, 30829, 31524, 32157). This will back up features like task state API, advanced progress bar, and Ray timeline.

💫Enhancements:



* Remote generator now works for ray actors and ray clients (31700, 31710).
* Revamp default scheduling strategy, improve worker startup performance up to 8x for embarrassingly parallel workloads (31934, 31868).
* Worker code clean up and allow workers lazy bind to jobs (31836, 31846, 30349, 31375).
* A single Ray cluster can scale up to 2000 nodes and 20k actors(32131, 30131, 31939, 30166, 30460, 30563).
* [Out-of-memory prevention enhancement](https://docs.ray.io/en/master/ray-core/scheduling/ray-oom-prevention.html) is now GA with more robust worker killing policies and better user experiences (#32217, 32361, 32219, 31768, 32107, 31976, 31272, 31509, 31230).

🔨 Fixes:



* Improve garbage collection upon job termination (32127, 31155)
* Fix opencensus protobuf bug (31632)
* Support python 3.10 for runtime_env conda (30970)
* Fix crashes and memory leaks (31640, 30476, 31488, 31917, 30761, 31018)

📖Documentation:



* Deprecation (31845, 31140, 31528)


Ray Clusters

🎉 New Features:
* Ray-on-Spark is now available as Preview! (28771, 31397, 31962)

💫Enhancements:



* [observability] Better memory formatting for `ray status` and autoscaler (32337)
* [autoscaler] Add flag to disable periodic cluster status log. (31869)

🔨 Fixes:



* [observability][autoscaler] Ensure pending nodes is reset to 0 after scaling (32085)
* Make ~/.bashrc optional in cluster launcher commands (32393)

📖Documentation:



* Improvements to job submission
* Remove references to Ray Client


Dashboard

🎉 New Features:



* New Information Architecture (beta): We’ve restructured the Ray dashboard to be organized around user personas and workflows instead of entities. For developers, the jobs and actors tab will be most useful. For infrastructure engineers, the cluster tab may be more valuable.
* Advanced progress bar: Tasks visualization that allow you to see the progress of all your ray tasks
* Timeline view: We’ve added a button to download detailed timeline data about your ray job. Then, one can click a link and use the perfetto open-source visualization tool to visualize the timeline data.
* More metadata tables. You can now see placement groups, tasks, actors, and other information related to your jobs.

📖Documentation:



* We’ve restructured the documentation to make the dashboard documentation more prominent
* We’ve improved the documentation around setting up Prometheus and Grafana for metrics.

Many thanks to all those who contributed to this release!

minerharry, scottsun94, iycheng, DmitriGekhtman, jbedorf, krfricke, simonsays1980, eltociear, xwjiang2010, ArturNiederfahrenhorst, richardliaw, avnishn, WeichenXu123, Capiru, davidxia, andreapiso, amogkam, sven1977, scottjlee, kylehh, yhna940, rickyyx, sihanwang41, n30111, Yard1, sriram-anyscale, Emiyalzn, simran-2797, cadedaniel, harelwa, ijrsvt, clarng, pabloem, bveeramani, lukehsiao, angelinalg, dmatrix, sijieamoy, simon-mo, jbesomi, YQ-Wang, larrylian, c21, AndreKuu, maxpumperla, architkulkarni, wuisawesome, justinvyu, zhe-thoughts, matthewdeng, peytondmurray, kevin85421, tianyicui-tsy, cassidylaidlaw, gvspraveen, scv119, kyuyeonpooh, Siraj-Qazi, jovany-wang, ericl, shrekris-anyscale, Catch-Bull, jianoaix, christy, MisterLin1995, kouroshHakha, pcmoritz, csko, gjoliver, clarkzinzow, SongGuyang, ckw017, ddelange, alanwguo, Dhul-Husni, Rohan138, rkooo567, fzyzcjy, chaokunyang, 0x2b3bfa0, zoltan-fedor, Chong-Li, crypdick, jjyao, emmyscode, stephanie-wang, starpit, smorad, nikitavemuri, zcin, tbukic, ayushthe1, mattip

2.2

* [Ray Jobs API](https://docs.ray.io/en/releases-2.2.0/cluster/running-applications/job-submission/index.html#ray-jobs-api) is now GA. The Ray Jobs API allows you to submit locally developed applications to a remote Ray Cluster for execution. It simplifies the experience of packaging, deploying, and managing a Ray application.
* [Ray Dashboard](https://docs.ray.io/en/releases-2.2.0/ray-core/ray-dashboard.html#ray-dashboard) has received a number of improvements, such as the ability to see cpu flame graphs of your Ray workers and new metrics for memory usage.
* The [Out-Of-Memory (OOM) Monitor](https://docs.ray.io/en/releases-2.2.0/ray-core/scheduling/ray-oom-prevention.html) is now enabled by default. This will increase the stability of memory-intensive applications on top of Ray.
* [Ray Data] we’ve heard numerous users report that when files are too large, Ray Data can have out-of-memory or performance issues. In this release, we’re enabling [dynamic block splitting](https://docs.ray.io/en/releases-2.2.0/data/dataset-internals.html#execution-memory) by default, which will address the above issues by avoiding holding too much data in memory.

Ray Libraries

Ray AIR

🎉 New Features:
* Add a NumPy first path for Torch and TensorFlow Predictors (28917)

💫Enhancements:
* Suppress "NumPy array is not writable" error in torch conversion (29808)
* Add node rank and local world size info to session (29919)

🔨 Fixes:
* Fix MLflow database integrity error (29794)
* Fix ResourceChangingScheduler dropping PlacementGroupFactory args (30304)
* Fix bug passing 'raise' to FailureConfig (30814)
* Fix reserved CPU warning if no CPUs are used (30598)

📖Documentation:
* Fix examples and docs to specify batch_format in BatchMapper (30438)

🏗 Architecture refactoring:
* Deprecate Wandb mixin (29828)
* Deprecate Checkpoint.to_object_ref and Checkpoint.from_object_ref (30365)

Ray Data Processing

🎉 New Features:
* Support all PyArrow versions released by Apache Arrow (29993, 29999)
* Add `select_columns()` to select a subset of columns (29081)
* Add `write_tfrecords()` to write TFRecord files (29448)
* Support MongoDB data source (28550)
* Enable dynamic block splitting by default (30284)
* Add `from_torch()` to create dataset from Torch dataset (29588)
* Add `from_tf()` to create dataset from TensorFlow dataset (29591)
* Allow to set `batch_size` in `BatchMapper` (29193)
* Support read/write from/to local node file system (29565)

💫Enhancements:
* Add `include_paths` in `read_images()` to return image file path (30007)
* Print out Dataset statistics automatically after execution (29876)
* Cast tensor extension type to opaque object dtype in `to_pandas()` and `to_dask()` (29417)
* Encode number of dimensions in variable-shaped tensor extension type (29281)
* Fuse AllToAllStage and OneToOneStage with compatible remote args (29561)
* Change `read_tfrecords()` output from Pandas to Arrow format (30390)
* Handle all Ray errors in task compute strategy (30696)
* Allow nested Chain preprocessors (29706)
* Warn user if missing columns and support `str` exclude in `Concatenator` (29443)
* Raise ValueError if preprocessor column doesn't exist (29643)

🔨 Fixes:
* Support custom resource with remote args for `random_shuffle()` (29276)
* Support custom resource with remote args for `random_shuffle_each_window()` (29482)
* Add PublicAPI annotation to preprocessors (29434)
* Tensor extension column concatenation fixes (29479)
* Fix `iter_batches()` to not return empty batch (29638)
* Change `map_batches()` to fetch input blocks on-demand (29289)
* Change `take_all()` to not accept limit argument (29746)
* Convert between block and batch correctly for `map_groups()` (30172)
* Fix `stats()` call causing Dataset schema to be unset (29635)
* Raise error when `batch_format` is not specified for `BatchMapper` (30366)
* Fix ndarray representation of single-element ragged tensor slices (30514)

📖Documentation:
* Improve `map_batches()` documentation about execution model and UDF pickle-ability requirement (29233)
* Improve `to_tf()` docstring (29464)

Ray Train

🎉 New Features:
* Added MosaicTrainer (29237, 29620, 29919)

💫Enhancements:
* Fast fail upon single worker failure (29927)
* Optimize checkpoint conversion logic (29785)

🔨 Fixes:
* Propagate DatasetContext to training workers (29192)
* Show correct error message on training failure (29908)
* Fix prepare_data_loader with enable_reproducibility (30266)
* Fix usage of NCCL_BLOCKING_WAIT (29562)

📖Documentation:
* Deduplicate Train examples (29667)

🏗 Architecture refactoring:
* Hard deprecate train.report (29613)
* Remove deprecated Train modules (29960)
* Deprecate old prepare_model DDP args 30364

Ray Tune

🎉 New Features:
* Make `Tuner.restore` work with relative experiment paths (30363)
* `Tuner.restore` from a local directory that has moved (29920)

💫Enhancements:
* `with_resources` takes in a `ScalingConfig` (30259)
* Keep resource specifications when nesting `with_resources` in `with_parameters` (29740)
* Add `trial_name_creator` and `trial_dirname_creator` to `TuneConfig` (30123)
* Add option to not override the working directory (29258)
* Only convert a `BaseTrainer` to `Trainable` once in the Tuner (30355)
* Dynamically identify PyTorch Lightning Callback hooks (30045)
* Make `remote_checkpoint_dir` work with query strings (30125)
* Make cloud checkpointing retry configurable (30111)
* Sync experiment-checkpoints more often (30187)
* Update generate_id algorithm (29900)

🔨 Fixes:
* Catch SyncerCallback failure with dead node (29438)
* Do not warn in BayesOpt w/ Uniform sampler (30350)
* Fix `ResourceChangingScheduler` dropping PGF args (30304)
* Fix Jupyter output with Ray Client and `Tuner` (29956)
* Fix tests related to `TUNE_ORIG_WORKING_DIR` env variable (30134)

📖Documentation:
* Add user guide for analyzing results (using `ResultGrid` and `Result`) (29072)
* Tune checkpointing and Tuner restore docfix (29411)
* Fix and clean up PBT examples (29060)
* Fix TrialTerminationReporter in docs (29254)

🏗 Architecture refactoring:
* Remove hard deprecated SyncClient/Syncer (30253)
* Deprecate Wandb mixin, move to `setup_wandb()` function (29828)

Ray Serve

🎉 New Features:
* Guard for high latency requests (29534)
* Java API Support ([blog](http://anyscale-staging.herokuapp.com/blog/flexible-cross-language-distributed-model-inference-framework-ray-serve-with))

💫Enhancements:
* Serve K8s HA benchmarking (30278)
* Add method info for http metrics (29918)

🔨 Fixes:
* Fix log format error (28760)
* Inherit previous deployment num_replicas (29686)
* Polish serve run deploy message (29897)
* Remove calling of get_event_loop from python 3.10

RLlib

🎉 New Features:
* Fault tolerant, elastic WorkerSets: An asynchronous Ray Actor manager class is now used inside all of RLlib’s Algorithms, adding fully flexible fault tolerance to rollout workers and workers used for evaluation. If one or more workers (which are Ray actors) fails - e.g. due to a SPOT instance going down - the RLlib Algorithm will now flexibly wait it out and periodically try to recreate the failed workers. In the meantime, only the remaining healthy workers are used for sampling and evaluation. (29938, 30118, 30334, 30252, 29703, 30183, 30327, 29953)

💫Enhancements:
* RLlib CLI: A new and enhanced RLlib command line interface (CLI) has been added, allowing for automatically downloading example configuration files, python-based config files (defining an AlgorithmConfig object to use), better interoperability between training and evaluation runs, and many more. For a detailed overview of what has changed, check out the [new CLI documentation](https://docs.ray.io/en/releases-2.2.0/rllib/rllib-cli.html). (#29204, 29459, 30526, 29661, 29972)
* Checkpoint overhaul: Algorithm checkpoints and Policy checkpoints are now more cohesive and transparent. All checkpoints are now characterized by a directory (with files and maybe sub-directories), rather than a single pickle file; Both Algorithm and Policy classes now have a utility static method (`from_checkpoint()`) for directly instantiating instances from a checkpoint directory w/o knowing the original configuration used or any other information (having the checkpoint is sufficient). [For a detailed overview, see here](https://docs.ray.io/en/releases-2.2.0/rllib/rllib-saving-and-loading-algos-and-policies.html). (#28812, 29772, 29370, 29520, 29328)
* A new metric for APPO/IMPALA/PPO has been added that measures off-policy’ness: The difference in number of grad-updates the sampler policy has received thus far vs the trained policy’s number of grad-updates thus far. (29983)

🏗 Architecture refactoring:
* AlgorithmConfig classes: All of RLlib’s Algorithms, RolloutWorkers, and other important classes now use AlgorithmConfig objects under the hood, instead of python config dicts. It is no longer recommended (however, still supported) to create a new algorithm (or a Tune+RLlib experiment) using a python dict as configuration. For more details on how to convert your scripts to the [new AlgorithmConfig design, see here](https://docs.ray.io/en/releases-2.2.0/rllib/rllib-training.html#configuring-rllib-algorithms). (29796, 30020, 29700, 29799, 30096, 29395, 29755, 30053, 29974, 29854, 29546, 30042, 29544, 30079, 30486, 30361)
* Major progress was made on the new Connector API and making sure it can be used (tentatively) with the “config.rollouts(enable_connectors=True)” flag. Will be fully supported, across all of RLlib’s algorithms, in Ray 2.3. (30307, 30434, 30459, 30308, 30332, 30320, 30383, 30457, 30446, 30024, 29064, 30398, 29385, 30481, 30241, 30285, 30423, 30288, 30313, 30220, 30159)
* Progress was made on the upcoming RLModule/RLTrainer/RLOptimizer APIs. (30135, 29600, 29599, 29449, 29642)

🔨 Fixes:
* Various bug fixes: 25925, 30279, 30478, 30461, 29867, 30099, 30185, 29222, 29227, 29494, 30257, 29798, 30176, 29648, 30331

📖Documentation:
* [RLlib CLI](https://docs.ray.io/en/releases-2.2.0/rllib/rllib-cli.html), [Checkpoint overhaul](https://docs.ray.io/en/releases-2.2.0/rllib/rllib-saving-and-loading-algos-and-policies.html), [AlgorithmConfigs](https://docs.ray.io/en/releases-2.2.0/rllib/rllib-training.html#configuring-rllib-algorithms)
* Minor fixes: 29261, 29752

Ray Core and Ray Clusters

Ray Core

🎉 New Features:
* [Out-of-memory monitor](https://docs.ray.io/en/releases-2.2.0/ray-core/scheduling/ray-oom-prevention.html) is now Beta and is enabled by default.

💫Enhancements:
* The [Ray Jobs API](https://docs.ray.io/en/releases-2.2.0/cluster/running-applications/job-submission/index.html#ray-jobs-api) has graduated from Beta to GA. This means Ray Jobs will maintain API backward compatibility.
* Run Ray job entrypoint commands (“driver scripts”) on worker nodes by specifying `entrypoint_num_cpus`, `entrypoint_num_gpus`, or `entrypoint_resources`. (28564, 28203)
* (Beta) OpenAPI spec for Ray Jobs REST API (30417)
* Improved Ray health checking mechanism. The fix will reduce the frequency of GCS marking raylets fail mistakenly when it is overloaded. (29346, 29442, 29389, 29924)

🔨 Fixes:
* Various fixes for hanging / deadlocking (29491, 29763, 30371, 30425)
* Set OMP_NUM_THREADS to `num_cpus` required by task/actors by default (30496)
* set worker non recyclable if gpu is envolved by default (30061)

📖Documentation:
* General improvements of Ray Core docs, including design patterns and tasks.

Ray Clusters

💫Enhancements:
* Stability improvements for Ray Autoscaler / KubeRay Operator integration. (29933 , 30281, 30502)

Dashboard

🎉 New Features:
* Additional improvements from the default metrics dashboard. We now have actor, placement group, and per component memory usage breakdown. [You can see details from the doc](https://docs.ray.io/en/releases-2.2.0/ray-observability/ray-metrics.html#system-metrics).
* New profiling feature using py-spy under the hood. You can click buttons to see stack trace or cpu flame graphs of your workers.
* Autoscaler and job events are available from the dashboard. You can also access the same data using `ray list cluster-events`.

🔨 Fixes:
* Stability improvements from the dashboard
* Dashboard now works at large scale cluster! It is tested with 250 nodes and 10K+ actors (which matches the [Ray scalability envelope](https://github.com/ray-project/ray/tree/master/release/benchmarks)).
* Smarter api fetching logic. We now wait for the previous API to finish before sending a new API request when polling for new data.
* Fix [agent memory leak](https://github.com/ray-project/ray/issues/29199) and high CPU usage.

💫Enhancements:
* General improvements to the progress bar. You can now see progress bars for each task name if you drill into the job details.
* More metadata is available in the jobs and actors tables.
* There is now a feedback button embedded into the dashboard. Please submit any bug reports or suggestions!

Many thanks to all those who contributed to this release!

shrekris-anyscale, rickyyx, scottjlee, shogohida, liuyang-my, matthewdeng, wjrforcyber, linusbiostat, clarkzinzow, justinvyu, zygi, christy, amogkam, cool-RR, jiaodong, EvgeniiTitov, jjyao, ilee300a, jianoaix, rkooo567, mattip, maxpumperla, ericl, cadedaniel, bveeramani, rueian, stephanie-wang, lcipolina, bparaj, JoonHong-Kim, avnishn, tomsunelite, larrylian, alanwguo, VishDev12, c21, dmatrix, xwjiang2010, thomasdesr, tiangolo, sokratisvas, heyitsmui, scv119, pcmoritz, bhavika, yzs981130, andraxin, Chong-Li, clarng, acxz, ckw017, krfricke, kouroshHakha, sijieamoy, iycheng, gjoliver, peytondmurray, xcharleslin, DmitriGekhtman, andreichalapco, vitrioil, architkulkarni, simon-mo, ArturNiederfahrenhorst, sihanwang41, pabloem, sven1977, avivhaber, wuisawesome, jovany-wang, Yard1

2.2.0

Not secure
Release Highlights

2.1.0

Not secure
Release Highlights

* Ray AI Runtime (AIR)
* Better support for Image-based workloads.
* Ray Datasets `read_images()` API for loading data.
* Numpy-based API for user-defined functions in Preprocessor.
* Ability to read TFRecord input.
* Ray Datasets `read_tfrecords()` API to read TFRecord files.
* Ray Serve:
* Add support for gRPC endpoint (alpha release). Instead of using an HTTP server, Ray Serve supports gRPC protocol and users can bring their own schema for their use case.
* RLlib:
* Introduce decision transformer (DT) algorithm.
* New hook for callbacks with `on_episode_created()`.
* Learning rate schedule to SimpleQ and PG.
* Ray Core:
* Ray [OOM prevention](https://docs.ray.io/en/master/ray-core/scheduling/ray-oom-prevention.html) (alpha release).
* Support dynamic generators as task return values.
* Dashboard:
* Time series metrics support.
* Export configuration files can be used in Prometheus or Grafana instances.
* New progress bar in job detail view.

Ray Libraries

Ray AIR

💫Enhancements:
* Improve readability of training failure output (27946, 28333, 29143)
* Auto-enable GPU for Predictors (26549)
* Add ability to create TorchCheckpoint from state dict (27970)
* Add ability to create TensorflowCheckpoint from saved model/h5 format (28474)
* Add attribute to retrieve URI from Checkpoint (28731)
* Add all allowable types to WandB Callback (28888)

🔨 Fixes:
* Handle nested metrics properly as scoring attribute (27715)
* Fix serializability of Checkpoints (28387, 28895, 28935)

📖Documentation:
* Miscellaneous updates to documentation and examples (28067, 28002, 28189, 28306, 28361, 28364, 28631, 28800)

🏗 Architecture refactoring:
* Deprecate Checkpoint.to_object_ref and Checkpoint.from_object_ref (28318)
* Deprecate legacy train/tune functions in favor of Session (28856)

Ray Data Processing

🎉 New Features:
* Add read_images (29177)
* Add read_tfrecords (28430)
* Add NumPy batch format to Preprocessor and `BatchMapper` (28418)
* Ragged tensor extension type (27625)
* Add KBinsDiscretizer Preprocessor (28389)

💫Enhancements:
* Simplify to_tf interface (29028)
* Add metadata override and inference in `Dataset.to_dask()` (28625)
* Prune unused columns before aggregate (28556)
* Add Dataset.default_batch_format (28434)
* Add partitioning parameter to read_ functions (28413)
* Deprecate "native" batch format in favor of "default" (28489)
* Support None partition field name (28417)
* Re-enable Parquet sampling and add progress bar (28021)
* Cap the number of stats kept in StatsActor and purge in FIFO order if the limit exceeded (27964)
* Customized serializer for Arrow JSON ParseOptions in read_json (27911)
* Optimize groupby/mapgroups performance (27805)
* Improve size estimation of image folder data source (27219)
* Use detached lifetime for stats actor (25271)
* Pin _StatsActor to the driver node (27765)
* Better error message for partition filtering if no file found (27353)
* Make Concatenator deterministic (27575)
* Change FeatureHasher input schema to expect token counts (27523)
* Avoid unnecessary reads when truncating a dataset with `ds.limit()` (27343)
* Hide tensor extension from UDFs (27019)
* Add __repr__ to AIR classes (27006)

🔨 Fixes:
* Add upper bound to pyarrow version check (29674) (29744)
* Fix map_groups to work with different output type (29184)
* read_csv not filter out files by default (29032)
* Check columns when adding rows to TableBlockBuilder (29020)
* Fix the peak memory usage calculation (28419)
* Change sampling to use same API as read Parquet (28258)
* Fix column assignment in Concatenator for Pandas 1.2. (27531)
* Doing partition filtering in reader constructor (27156)
* Fix split ownership (27149)

📖Documentation:
* Clarify dataset transformation. (28482)
* Update map_batches documentation (28435)
* Improve docstring and doctest for read_parquet (28488)
* Activate dataset doctests (28395)
* Document using a different separator for read_csv (27850)
* Convert custom datetime column when reading a CSV file (27854)
* Improve preprocessor documentation (27215)
* Improve `limit()` and `take()` docstrings (27367)
* Reorganize the tensor data support docs (26952)
* Fix nyc_taxi_basic_processing notebook (26983)

Ray Train

🎉 New Features:
* Add FullyShardedDataParallel support to TorchTrainer (28096)

💫Enhancements:
* Add rich notebook repr for DataParallelTrainer (26335)
* Fast fail if training loop raises an error on any worker (28314)
* Use torch.encode_data with HorovodTrainer when torch is imported (28440)
* Automatically set NCCL_SOCKET_IFNAME to use ethernet (28633)
* Don't add Trainer resources when running on Colab (28822)
* Support large checkpoints and other arguments (28826)

🔨 Fixes:
* Fix and improve HuggingFaceTrainer (27875, 28154, 28170, 28308, 28052)
* Maintain dtype info in LightGBMPredictor (28673)
* Fix prepare_model (29104)
* Fix `train.torch.get_device()` (28659)

📖Documentation:
* Clarify LGBM/XGB Trainer documentation (28122)
* Improve Hugging Face notebook example (28121)
* Update Train API reference and docs (28192)
* Mention FSDP in HuggingFaceTrainer docs (28217)

🏗 Architecture refactoring:
* Improve Trainer modularity for extensibility (28650)

Ray Tune

🎉 New Features:
* Add `Tuner.get_results()` to retrieve results after restore (29083)

💫Enhancements:
* Exclude files in sync_dir_between_nodes, exclude temporary checkpoints (27174)
* Add rich notebook output for Tune progress updates (26263)
* Add logdir to W&B run config (28454)
* Improve readability for long column names in table output (28764)
* Add functionality to recover from latest available checkpoint (29099)
* Add retry logic for restoring trials (29086)

🔨 Fixes:
* Re-enable progress metric detection (28130)
* Add timeout to retry_fn to catch hanging syncs (28155)
* Correct PB2’s beta_t parameter implementation (28342)
* Ignore directory exists errors to tackle race conditions (28401)
* Correctly overwrite files on restore (28404)
* Disable pytorch-lightning multiprocessing per default (28335)
* Raise error if scheduling an empty PlacementGroupFactory28445
* Fix trial cleanup after x seconds, set default to 600 (28449)
* Fix trial checkpoint syncing after recovery from other node (28470)
* Catch empty hyperopt search space, raise better Tuner error message (28503)
* Fix and optimize sample search algorithm quantization logic (28187)
* Support tune.with_resources for class methods (28596)
* Maintain consistent Trial/TrialRunner state when pausing and resuming trial with PBT (28511)
* Raise better error for incompatible gcsfs version (28772)
* Ensure that exploited in-memory checkpoint is used by trial with PBT (28509)
* Fix Tune checkpoint tracking for minimizing metrics (29145)

📖Documentation:
* Miscelleanous documentation fixes (27117, 28131, 28210, 28400, 28068, 28809)
* Add documentation around trial/experiment checkpoint (28303)
* Add basic parallel execution guide for Tune (28677)
* Add example PBT notebook (28519)

🏗 Architecture refactoring:
* Store SyncConfig and CheckpointConfig in Experiment and Trial (29019)

Ray Serve

🎉 New Features:
* Added gRPC direct ingress support [alpha version] (28175)
* Serve cli can provide kubernetes formatted output (28918)
* Serve cli can provide user config output without default value (28313)

💫Enhancements:
* Enrich more benchmarks
* image objection with resnet50 mode with image preprocessing (29096)
* gRPC vs HTTP inference performance (28175)
* Add health check metrics to reflect the replica health status (29154)

🔨 Fixes:
* Fix memory leak issues during inference (29187)
* Fix unexpected http options omit warning when using serve cli to start the ray serve (28257)
* Fix unexpected long poll exceptions (28612)

📖Documentation:
* Add e2e fault tolerance instructions (28721)
* Add Direct Ingress instructions (29149)
* Bunch of doc improvements on “dev workflow”, “custom resources”, “serve cli” etc (29147, 28708, 28529, 28527)

RLlib

🎉 New Features:
* Decision Transformer (DT) Algorithm added (27890, 27889, 27872, 27829).
* Callbacks now have a new hook `on_episode_created()`. (28600)
* Added learning rate schedule to SimpleQ and PG. (28381)

💫Enhancements:
* Soft target network update is now supported by all off-policy algorithms (e.g DQN, DDPG, etc.) (28135)
* Stop RLlib from "silently" selecting atari preprocessors. (29011)
* Improved offline RL and off-policy evaluation performance (28837, 28834, 28593, 28420, 28136, 28013, 27356, 27161, 27451).
* Escalated old deprecation warnings to errors (28807, 28795, 28733, 28697).
* Others: 27619, 27087.

🔨 Fixes:
* Various bug fixes: 29077, 28811, 28637, 27785, 28703, 28422, 28405, 28358, 27540, 28325, 28357, 28334, 27090, 28133, 27981, 27980, 26666, 27390, 27791, 27741, 27424, 27544, 27459, 27572, 27255, 27304, 26629, 28166, 27864, 28938, 28845, 28588, 28202, 28201, 27806

📖Documentation:
* Connectors. (27528)
* Training step API. (27344)
* Others: 28299, 27460

Ray Workflows

🔨 Fixes:
* Fixed the object loss due to driver exit (29092)
* Change the name in step to task_id (28151)

Ray Core and Ray Clusters

Ray Core

🎉 New Features:
* Ray [OOM prevention](https://docs.ray.io/en/master/ray-core/scheduling/ray-oom-prevention.html) feature alpha release! If your Ray jobs suffer from OOM issues, please give it a try.
* Support dynamic generators as task return values. (29082 28864 28291)

💫Enhancements:
* Fix spread scheduling imbalance issues (28804 28551 28551)
* Widening range of grpcio versions allowed (28623)
* Support encrypted redis connection. (29109)
* Upgrade redis from 6.x to 7.0.5. (28936)
* Batch ScheduleAndDispatchTasks calls (28740)

🔨 Fixes:
* More robust spilled object deletion (29014)
* Fix the initialization/destruction order between reference_counter_ and node change subscription (29108)
* Suppress the logging error when python exits and actor not deleted (27300)
* Mark `run_function_on_all_workers` as deprecated until we get rid of this (29062)
* Remove unused args for default_worker.py (28177)
* Don't include script directory in sys.path if it's started via python -m (28140)
* Handling edge cases of max_cpu_fraction argument (27035)
* Fix out-of-band deserialization of actor handle (27700)
* Allow reuse of cluster address if Ray is not running (27666)
* Fix a uncaught exception upon deallocation for actors (27637)
* Support placement_group=None in PlacementGroupSchedulingStrategy (27370)

📖Documentation:
* [Ray 2.0 white paper](https://docs.google.com/document/d/1tBw9A4j62ruI5omIJbMxly-la5w4q_TjyJgJL_jN2fI/preview) is published.
* Revamp ray core docs (29124 29046 28953 28840 28784 28644 28345 28113 27323 27303)
* Fix cluster docs (28056 27062)
* CLI Reference Documentation Revamp (27862)

Ray Clusters

💫Enhancements:
* Distinguish Kubernetes deployment stacks (28490)

📖Documentation:
* State intent to remove legacy Ray Operator (29178)
* Improve KubeRay migration notes (28672)
* Add FAQ for cluster multi-tenancy support (29279)

Dashboard

🎉 New Features:
* Time series metrics are now built into the dashboard
* Ray now exports some default configuration files which can be used for your Prometheus or Grafana instances. This includes default metrics which show common information important to your Ray application.
* New progress bar is shown in the job detail view. You can see how far along your ray job is.

🔨 Fixes:
* Fix to prometheus exporter producing a slightly incorrect format.
* Fix several performance issues and memory leaks

📖Documentation:
* Added additional documentation on the new time series and the metrics page

Many thanks to all those who contributed to this release!

sihanwang41, simon-mo, avnishn, MyeongKim, markrogersjr, christy, xwjiang2010, kouroshHakha, zoltan-fedor, wumuzi520, alanwguo, Yard1, liuyang-my, charlesjsun, DevJake, matteobettini, jonathan-conder-sm, mgerstgrasser, guidj, JiahaoYao, Zyiqin-Miranda, jvanheugten, aallahyar, SongGuyang, clarng, architkulkarni, Rohan138, heyitsmui, mattip, ArturNiederfahrenhorst, maxpumperla, vale981, krfricke, DmitriGekhtman, amogkam, richardliaw, maldil, zcin, jianoaix, cool-RR, kira-lin, gramhagen, c21, jiaodong, sijieamoy, tupui, ericl, anabranch, se4ml, suquark, dmatrix, jjyao, clarkzinzow, smorad, rkooo567, jovany-wang, edoakes, XiaodongLv, klieret, rozsasarpi, scottsun94, ijrsvt, bveeramani, chengscott, jbedorf, kevin85421, nikitavemuri, sven1977, acxz, stephanie-wang, PaulFenton, WangTaoTheTonic, cadedaniel, nthai, wuisawesome, rickyyx, artemisart, peytondmurray, pingsutw, olipinski, davidxia, stestagg, yaxife, scv119, mwtian, yuanchi2807, ntlm1686, shrekris-anyscale, cassidylaidlaw, gjoliver, ckw017, hakeemta, ilee300a, avivhaber, matthewdeng, afarid, pcmoritz, Chong-Li, Catch-Bull, justinvyu, iycheng

2.0.1

Not secure
The Ray 2.0.1 patch release contains dependency upgrades and fixes for multiple components:

- Upgrade grpcio version to 1.32 ([28025](https://github.com/ray-project/ray/pull/28025))
- Upgrade redis version to 7.0.5 ([28936](https://github.com/ray-project/ray/pull/28936))
- Fix segfault when using runtime environments ([28409](https://github.com/ray-project/ray/pull/28409))
- Increase RPC timeout for dashboard ([28330](https://github.com/ray-project/ray/pull/28330))
- Set correct path when using `python -m` ([28140](https://github.com/ray-project/ray/pull/28140))
- [Autoscaler] Fix autoscaling for 0 CPU head node ([26813](https://github.com/ray-project/ray/pull/26813))
- [Serve] Allow code in private remote Git URIs to be imported ([28250](https://github.com/ray-project/ray/pull/28250))
- [Serve] Allow `host` and `port` in Serve config ([27026](https://github.com/ray-project/ray/pull/27026))
- [RLlib] Evaluation supports asynchronous rollout (single slow eval worker will not block the overall evaluation progress). ([27390](https://github.com/ray-project/ray/pull/27390))
- [Tune] Fix hang during checkpoint synchronization ([28155](https://github.com/ray-project/ray/pull/28155))
- [Tune] Fix trial restoration from different IP ([28470](https://github.com/ray-project/ray/pull/28470))
- [Tune] Fix custom synchronizer serialization ([28699](https://github.com/ray-project/ray/pull/28699))
- [Workflows] Replace deprecated `name` option with `task_id` ([28151](https://github.com/ray-project/ray/pull/28151))

Page 7 of 17

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.