Highlight
- Ray Datasets is now in alpha (https://docs.ray.io/en/master/data/dataset.html)
- LightGBM on Ray is now in beta (https://github.com/ray-project/lightgbm_ray).
- enables multi-node and multi-GPU training
- integrates seamlessly with distributed hyperparameter optimization library Ray Tune
- comes with fault tolerance handling mechanisms, and
- supports distributed dataframes and distributed data loading
Ray Autoscaler
🎉 New Features:
- Aliyun support (15712)
💫 Enhancements:
- [Kubernetes] Operator refactored to use Kopf package (15787)
- Flag to control config bootstrap for rsync (16667)
- Prometheus metrics for Autoscaler (16066, 16198)
- Allows launching in subnets where public IP assignments off by default (16816)
🔨 Fixes:
- [Kubernetes] Fix GPU=0 resource handling (16887)
- [Kubernetes] Release docs updated with K8s test instructions (16662)
- [Kubernetes] Documentation update (16570)
- [Kubernetes] All official images set to rayproject/ray:latest (15988 16205)
- [Local] Fix bootstrapping ray at a given static set of ips (16202, 16281)
- [Azure] Fix Azure Autoscaling Failures (16640)
- Handle node type key change / deletion (16691)
- [GCP] Retry GCP BrokenPipeError (16952)
Ray Client
🎉 New Features:
- Client integrations with major Ray Libraries (15932, 15996, 16103, 16034, 16029, 16111, 16301)
- Client Connect now returns a context that has`disconnect` and can be used as a context manager (16021)
💫 Enhancements:
- Better support for multi-threaded client-side applications (16731, 16732)
- Improved error messages and warnings when misusing Ray Client (16454, 16508, 16588, 16163)
- Made Client Object & Actor refs a subclass of their non-client counterparts (16110)
🔨 Fixes:
- `dir()` Works for client-side Actor Handles (16157)
- Avoid server-side time-outs (16554)
- Various fixes to the client-server proxy (16040, 16038, 16057, 16180)
Ray Core
🎉 New Features:
- Ray dataset alpha is available!
🔨 Fixes:
- Fix various Ray IO layer issues that fixes hanging & high memory usage (16408, 16422, 16620, 16824, 16791, 16487, 16407, 16334, 16167, 16153, 16314, 15955, 15775)
- Namespace now properly isolates placement groups (16000)
- More efficient object transfer for spilled objects (16364, 16352)
🏗 Architecture refactoring:
- From Ray 1.5.0, liveness of Ray jobs are guaranteed as long as there’s enough disk space in machines with the “fallback allocator” mechanism which allocates plasma objects to the disk directly when objects cannot be created in memory or spilled to the disk.
RLlib
🎉 New Features:
- Support for adding/deleting Policies to a Trainer on-the-fly (16359, 16569, 16927).
- Added new “input API” for customizing offline datasets (shoutout to Julius F.). (16957)
- Allow for external env PolicyServer to listen on n different ports (given n rollout workers); No longer require creating an env on the server side to get env’s spaces. (16583).
🔨 Fixes:
- CQL: Bug fixes and clean-ups (fixed iteration count). (16531, 16332)
- D4RL: 16721
- ensure curiosity exploration actions are passed in as tf tensors (shoutout to Manny V.). (15704)
- Other bug fixes and cleanups: 16162 and 16309 (shoutout to Chris B.), 15634, 16133, 16860, 16813, 16428, 16867, 16354, 16218, 16118, 16429, 16427, 16774, 16734, 16019, 16171, 16830, 16722
📖 Documentation and testing:
- 16311, 15908, 16271, 16080, 16740, 16843
🏗 Architecture refactoring:
- All RLlib algos operating on Box action spaces now operate on normalized actions by default (ranging from -1.0 to 1.0). This enables PG-style algos to learn in skewed action spaces. (16531)
Tune
🎉 New Features:
- New integration with LightGBM via Tune callbacks (16713).
- New cost-efficient HPO searchers (BlendSearch and CFO) available from the FLAML library (https://github.com/microsoft/FLAML). (#16329)
💫 Enhancements:
- Pass in configurations that have already been evaluated separately to Searchers. This is useful for warm-starting or for meta-searchers, for example (16485)
- Sort trials in reporter table by metric (16576)
- Add option to keep random values constant over grid search (16501)
- Read trial results from json file (15915)
🔨 Fixes:
- Fix infinite loop when using ``Searcher`` that limits concurrency internally in conjunction with a ``ConcurrencyLimiter`` (16416)
- Allow custom sync configuration with ``DurableTrainable`` (16739)
- Logger fixes. W&B: 16806, 16674, 16839. MLflow: 16840
- Various bug fixes: 16844, 16017, 16575, 16675, 16504, 15811, 15899, 16128, 16396, 16695, 16611
📖 Documentation and testing:
- Use BayesOpt for quick start example (16997)
- 16793, 16029, 15932, 16980, 16450, 16709, 15913, 16754, 16619
SGD
🎉 New Features:
- Torch native mixed precision is now supported! (16382)
🔨 Fixes:
- Use target label count for training batch size (16400)
📖 Documentation and testing:
- 15999, 16111, 16301, 16046
Serve
💫 Enhancements: UX improvements (16227, 15909), Improved logging (16468)
🔨 Fixes: Fix shutdown logic (16524), Assorted bug fixes (16647, 16760, 16783)
📖 Documentation and testing: 16042, 16631, 16759, 16786
Thanks
Many thanks to all who contributed to this release:
Tonyhao96, simon-mo, scv119, Yard1, llan-ml, xcharleslin, jovany-wang, ijrsvt, max0x7ba, annaluo676, rajagurunath, zuston, amogkam, yorickvanzweeden, mxz96102, chenk008, Bam4d, mGalarnyk, kfstorm, crdnb, suquark, ericl, marload, jiaodong, thexiang, ellimac54, qicosmos, mwtian, jkterry1, sven1977, howardlau1999, mvindiola1, stefanbschneider, juliusfrost, krfricke, matthewdeng, zhuangzhuang131419, brandonJY, Eleven1Liu, nikitavemuri, richardliaw, iycheng, stephanie-wang, HuangLED, clarkzinzow, fyrestone, asm582, qingyun-wu, ckw017, yncxcw, DmitriGekhtman, benjamindkilleen, Chong-Li, kathryn-zhou, pcmoritz, rodrigodelazcano, edoakes, dependabot[bot], pdames, frenkowski, loicsacre, gabrieleoliaro, achals, thomasjpfan, rkooo567, dibgerge, clay4444, architkulkarni, lixin-wei, ConeyLiu, WangTaoTheTonic, AnnaKosiorek, wuisawesome, gramhagen, zhisbug, franklsf95, vakker, jenhaoyang, liuyang-my, chaokunyang, SongGuyang, tgaddair