New features
* [DeepSpeed Data Efficiency: A composable library that makes better use of data, increases training efficiency, and improves model quality](https://www.deepspeed.ai/2022/12/11/data-efficiency.html)
* DeepSpeed Data Efficiency Library by conglongli in https://github.com/microsoft/DeepSpeed/pull/2585
What's Changed
* fix blog link by conglongli in https://github.com/microsoft/DeepSpeed/pull/2600
* Migrate ops tests to new inference_ops marker by cmikeh2 in https://github.com/microsoft/DeepSpeed/pull/2599
* Move layer norm to new schedule by lokoppakmsft in https://github.com/microsoft/DeepSpeed/pull/2590
* [deepspeed/autotuner] Bug fix for binary search for batch size by rahilbathwal5 in https://github.com/microsoft/DeepSpeed/pull/2162
* Fix for older versions of pydantic by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/2611
* Use rocm/pytorch:latest for ROCm Dockerfile by jithunnair-amd in https://github.com/microsoft/DeepSpeed/pull/2613
* skip torch.zeros and tensor.copy_ when model parallel is not used by guoyejun in https://github.com/microsoft/DeepSpeed/pull/2479
* call empty_cache to really free up GPU memory as described in comment by guoyejun in https://github.com/microsoft/DeepSpeed/pull/2620
* Remove GatheredParameters context from replace_with_policy by lekurile in https://github.com/microsoft/DeepSpeed/pull/2591
* fixes 2498 by clumsy in https://github.com/microsoft/DeepSpeed/pull/2603
* Update AVX512 Detection by cmikeh2 in https://github.com/microsoft/DeepSpeed/pull/2621
* Add Megatron CI workflow by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/2614
* [inference] check for unsupported model generate args by jeffra in https://github.com/microsoft/DeepSpeed/pull/2627
* [launcher] parse hostfile via regex and added error checks by jeffra in https://github.com/microsoft/DeepSpeed/pull/2626
* Unit tests setup own venv by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/2628
* Fix 2409: add enable_each_rank_log to deepspeed/launcher/runner.py by inkcherry in https://github.com/microsoft/DeepSpeed/pull/2571
* Fix typo in autotuner.py by eltociear in https://github.com/microsoft/DeepSpeed/pull/2639
* [zero-3] Handle forward parameter return correctly in nested cases by samyam in https://github.com/microsoft/DeepSpeed/pull/2642
* [inference] ds-attention refactor w.r.t. ops by jeffra in https://github.com/microsoft/DeepSpeed/pull/2623
* Fix issue w. bloom int8 when changing tp size by jeffra in https://github.com/microsoft/DeepSpeed/pull/2645
* fix assertion error in zero stage 3 by GuanhuaWang in https://github.com/microsoft/DeepSpeed/pull/2647
* tweaks to ds-attn, distilbert policy, and mup by jeffra in https://github.com/microsoft/DeepSpeed/pull/2649
* [doc] fix `min_loss_scale` default by stas00 in https://github.com/microsoft/DeepSpeed/pull/2660
* [launcher] fail gracefully if hostname -i doesn't work as expected by jeffra in https://github.com/microsoft/DeepSpeed/pull/2631
* Fix Opt injection by RezaYazdaniAminabadi in https://github.com/microsoft/DeepSpeed/pull/2541
* Abstract accelerator (step 2) by delock in https://github.com/microsoft/DeepSpeed/pull/2560
* Remove unnecessary device synchronization for stage 2 by li-yi-dong in https://github.com/microsoft/DeepSpeed/pull/2500
* [Bug Fixed] torch.cuda.is_available -> torch.cuda.is_available() by wkcn in https://github.com/microsoft/DeepSpeed/pull/2661
* [fp16] lower `initial_scale_power` to `16` by stas00 in https://github.com/microsoft/DeepSpeed/pull/2663
* fix Tensor contiguous bug in model_compression by xiaoxiawu-microsoft in https://github.com/microsoft/DeepSpeed/pull/2671
* [inference] ds-mlp refactor w.r.t. ops by jeffra in https://github.com/microsoft/DeepSpeed/pull/2668
* real_accelerator validation check for both accelerator and deepspeed accelerator path by delock in https://github.com/microsoft/DeepSpeed/pull/2685
* fix typo and remove duplicated code in ZeRO stage 1 and 2 by wkcn in https://github.com/microsoft/DeepSpeed/pull/2655
* Add mlflow logging for aml by cassieesvelt in https://github.com/microsoft/DeepSpeed/pull/2495
* Fix import error of op_builder by tohtana in https://github.com/microsoft/DeepSpeed/pull/2687
* Pass training flag to forward call from module config by lokoppakmsft in https://github.com/microsoft/DeepSpeed/pull/2604
* Extend quantization utils features by lokoppakmsft in https://github.com/microsoft/DeepSpeed/pull/2683
* [GatheredParameters] add support for any iterable by stas00 in https://github.com/microsoft/DeepSpeed/pull/2664
* Fix for latest diffusers by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/2699
* exclude benchmarks during install by jeffra in https://github.com/microsoft/DeepSpeed/pull/2698
* Correct loss scale in ZeRO step by jomayeri in https://github.com/microsoft/DeepSpeed/pull/2695
* [ZeRO] non-MoE stage 1 requires CG disabled by jeffra in https://github.com/microsoft/DeepSpeed/pull/2703
* remove print side effect from importing deepspeed by jeffra in https://github.com/microsoft/DeepSpeed/pull/2704
* ZeRO3 handling frozen weights by tjruwase in https://github.com/microsoft/DeepSpeed/pull/2653
New Contributors
* eltociear made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2639
* li-yi-dong made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2500
* wkcn made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2661
* xiaoxiawu-microsoft made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2671
* cassieesvelt made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2495
* tohtana made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2687
**Full Changelog**: https://github.com/microsoft/DeepSpeed/compare/v0.7.7...v0.8.0