New features
* 🚀 [DeepSpeed Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-chat) 🚀
What's Changed
* [docs] add MCR-DL paper to readme/docs by Quentin-Anthony in https://github.com/microsoft/DeepSpeed/pull/3066
* Several fixes to unblock CI by loadams in https://github.com/microsoft/DeepSpeed/pull/3047
* Assert mp_size is factor of model dimensions by molly-smith in https://github.com/microsoft/DeepSpeed/pull/2891
* [CI] follow-up fixes by jeffra in https://github.com/microsoft/DeepSpeed/pull/3072
* fix return prev key and value , added strides to from_blob by mzusman in https://github.com/microsoft/DeepSpeed/pull/2828
* Remove bf16 from inference config dtye enum by molly-smith in https://github.com/microsoft/DeepSpeed/pull/3010
* Softmax Scheduling Cleanup by cmikeh2 in https://github.com/microsoft/DeepSpeed/pull/3046
* Fix nebula in save_16bit_model issue by FreyaRao in https://github.com/microsoft/DeepSpeed/pull/3023
* Allow lists by satpalsr in https://github.com/microsoft/DeepSpeed/pull/3042
* Goodbye Torch 1.8 by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/3082
* Empty ZeRO3 partition cache by tjruwase in https://github.com/microsoft/DeepSpeed/pull/3060
* pre-commit check for torch.cuda in code by delock in https://github.com/microsoft/DeepSpeed/pull/2981
* Move cuda check into utils by loadams in https://github.com/microsoft/DeepSpeed/pull/3074
* update yapf version and style settings by jeffra in https://github.com/microsoft/DeepSpeed/pull/3098
* Fix comms benchmark import issues and support MPI/slurm launching by Quentin-Anthony in https://github.com/microsoft/DeepSpeed/pull/2932
* Disable Stage 1&2 CPUAdam pathways by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/3097
* ♻️ replace deprecated functions for communication by mayank31398 in https://github.com/microsoft/DeepSpeed/pull/2995
* Make fp32 default communication data type by tjruwase in https://github.com/microsoft/DeepSpeed/pull/2970
* Update DeepSpeed copyright license to Apache 2.0 by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/3111
* Add Full Apache License by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/3119
* VL MoE Blog by yaozhewei in https://github.com/microsoft/DeepSpeed/pull/3120
* Update SD triton version in requirements-sd.txt by lekurile in https://github.com/microsoft/DeepSpeed/pull/3135
* Fix launch issue by tjruwase in https://github.com/microsoft/DeepSpeed/pull/3137
* Fix CI badges by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/3138
* Optimize Softmax Kernel by molly-smith in https://github.com/microsoft/DeepSpeed/pull/3112
* Use generic O_DIRECT by tjruwase in https://github.com/microsoft/DeepSpeed/pull/3115
* Enable autoTP for bloom by sywangyi in https://github.com/microsoft/DeepSpeed/pull/3035
* [cleanup] remove `pass` calls where they aren't needed by stas00 in https://github.com/microsoft/DeepSpeed/pull/2826
* [ci] `nv-transformers-v100` - use the same torch version as transformers CI by stas00 in https://github.com/microsoft/DeepSpeed/pull/3096
* Fixes code and tests skipping/asserting incorrectly on torch 2+. by loadams in https://github.com/microsoft/DeepSpeed/pull/3136
* fix example symlink about DeepSpeed+AzureML by EeyoreLee in https://github.com/microsoft/DeepSpeed/pull/3127
* Remove Extra Bracket by VHellendoorn in https://github.com/microsoft/DeepSpeed/pull/3101
* Recover shared parameters by ShijieZZZZ in https://github.com/microsoft/DeepSpeed/pull/3033
* Fix for Diffusers 0.14.0 by molly-smith in https://github.com/microsoft/DeepSpeed/pull/3142
* Fix copyright check, add copyright replace script by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/3141
* Update curriculum-learning.md by goodship1 in https://github.com/microsoft/DeepSpeed/pull/3031
* Remove benchmark code by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/3157
* fixing a bug in CPU Adam and Adagrad by xiexbing in https://github.com/microsoft/DeepSpeed/pull/3109
* op_builder: conditionally compute relative path for hip compiled files by adammoody in https://github.com/microsoft/DeepSpeed/pull/3095
* zero.Init() should pin params in GPU memory as requested by tjruwase in https://github.com/microsoft/DeepSpeed/pull/2953
* deepspeed/runtime/utils.py: reset_peak_memory_stats when empty cache by guoyejun in https://github.com/microsoft/DeepSpeed/pull/2803
* Add DeepSpeed-Chat Blogpost by awan-10 in https://github.com/microsoft/DeepSpeed/pull/3185
* [docs] add run command for 13b by awan-10 in https://github.com/microsoft/DeepSpeed/pull/3187
* add news item. by awan-10 in https://github.com/microsoft/DeepSpeed/pull/3188
* DeepSpeed Chat by tjruwase in https://github.com/microsoft/DeepSpeed/pull/3186
* Fix references to figures by tohtana in https://github.com/microsoft/DeepSpeed/pull/3189
* Fix typo by zhouzaida in https://github.com/microsoft/DeepSpeed/pull/3183
* Fix typo by dawei-wang in https://github.com/microsoft/DeepSpeed/pull/3164
* Chatgpt chinese blog by yaozhewei in https://github.com/microsoft/DeepSpeed/pull/3193
* Add Japanese version of ChatGPT-like pipeline blog by tohtana in https://github.com/microsoft/DeepSpeed/pull/3194
* fix hero figure by conglongli in https://github.com/microsoft/DeepSpeed/pull/3199
* feat: Add support for `NamedTuple` when sharding parameters [3029] by AlexanderVanEck in https://github.com/microsoft/DeepSpeed/pull/3037
* fix license badge by conglongli in https://github.com/microsoft/DeepSpeed/pull/3200
* Update AMD workflows by loadams in https://github.com/microsoft/DeepSpeed/pull/3179
* [CPU support] Optionally bind each rank to different cores on host by delock in https://github.com/microsoft/DeepSpeed/pull/2881
New Contributors
* mzusman made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2828
* FreyaRao made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3023
* sywangyi made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3035
* EeyoreLee made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3127
* VHellendoorn made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3101
* goodship1 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3031
* zhouzaida made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3183
* dawei-wang made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3164
* AlexanderVanEck made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3037
**Full Changelog**: https://github.com/microsoft/DeepSpeed/compare/v0.8.3...v0.9.0