New Bug Fixes
* Stable Diffusion now supported with latest Torch, diffusers, and Triton versions.
What's Changed
* Update version.txt after 0.12.2 release by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/4617
* Fix figure in FlexGen blog by tohtana in https://github.com/microsoft/DeepSpeed/pull/4624
* Fix figure of llama2 13B in DS-FlexGen blog by tohtana in https://github.com/microsoft/DeepSpeed/pull/4625
* Fix config format by xu-song in https://github.com/microsoft/DeepSpeed/pull/4594
* Guanhua/partial offload rebase v2 (590) by GuanhuaWang in https://github.com/microsoft/DeepSpeed/pull/4636
* offload++ blog (623) by GuanhuaWang in https://github.com/microsoft/DeepSpeed/pull/4637
* Update README in offloadpp blog by GuanhuaWang in https://github.com/microsoft/DeepSpeed/pull/4641
* [docs] update news items by jeffra in https://github.com/microsoft/DeepSpeed/pull/4640
* DeepSpeed-FastGen Chinese Blog by HeyangQin in https://github.com/microsoft/DeepSpeed/pull/4642
* Fix issues with torch cpu builds by loadams in https://github.com/microsoft/DeepSpeed/pull/4639
* Isolate src code and testing for DeepSpeed-FastGen by cmikeh2 in https://github.com/microsoft/DeepSpeed/pull/4610
* Add Japanese blog for DeepSpeed-FastGen by tohtana in https://github.com/microsoft/DeepSpeed/pull/4651
* Fix for MII unit tests by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/4652
* Enhance the robustness of `module_state_dict` by LZHgrla in https://github.com/microsoft/DeepSpeed/pull/4587
* Enable ZeRO3 allgather for multiple dtypes by tohtana in https://github.com/microsoft/DeepSpeed/pull/4647
* add option to disable pipeline partitioning by nelyahu in https://github.com/microsoft/DeepSpeed/pull/4322
* Added __HIP_PLATFORM_AMD__=1 for non JIT build by rraminen in https://github.com/microsoft/DeepSpeed/pull/4585
* Fix rope_theta arg for diffusers_attention by lekurile in https://github.com/microsoft/DeepSpeed/pull/4656
* tl.dot(a,b, trans_b=True) is not supported by triton2.0+ , updating this api by bmedishe in https://github.com/microsoft/DeepSpeed/pull/4541
* Update ds-chat workflow to work w/ deepspeed-chat install by lekurile in https://github.com/microsoft/DeepSpeed/pull/4598
* Diffusers attention script update triton2.1 by bmedishe in https://github.com/microsoft/DeepSpeed/pull/4573
* Fix the openfold training. by cctry in https://github.com/microsoft/DeepSpeed/pull/4657
* Universal ckp fixes by mosheisland in https://github.com/microsoft/DeepSpeed/pull/4588
* Update .gitignore [Adding comments , Improved documentation] by Nadav23AnT in https://github.com/microsoft/DeepSpeed/pull/4631
* Update lr_schedules.py by CoinCheung in https://github.com/microsoft/DeepSpeed/pull/4563
* Fix UNET and VAE implementations for new diffusers version by lekurile in https://github.com/microsoft/DeepSpeed/pull/4663
* fix num_kv_heads sharding in autoTP for the new in-repo Falcon-40B by dc3671 in https://github.com/microsoft/DeepSpeed/pull/4654
New Contributors
* xu-song made their first contribution in https://github.com/microsoft/DeepSpeed/pull/4594
* LZHgrla made their first contribution in https://github.com/microsoft/DeepSpeed/pull/4587
* mosheisland made their first contribution in https://github.com/microsoft/DeepSpeed/pull/4588
* Nadav23AnT made their first contribution in https://github.com/microsoft/DeepSpeed/pull/4631
* CoinCheung made their first contribution in https://github.com/microsoft/DeepSpeed/pull/4563
**Full Changelog**: https://github.com/microsoft/DeepSpeed/compare/v0.12.2...v0.12.3