What's Changed
* [docs] add 530b paper by jeffra in https://github.com/microsoft/DeepSpeed/pull/1979
* small fix for the HF Bert models by RezaYazdaniAminabadi in https://github.com/microsoft/DeepSpeed/pull/1984
* Add unit test for various model families and inference tasks by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/1981
* Fix for lightning tests by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/1988
* fix typo when getting kernel dim in conv calculation by cli99 in https://github.com/microsoft/DeepSpeed/pull/1989
* Add torch-latest and torch-nightly CI workflows by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/1990
* [bug] Add user-defined launcher args for MPI launcher by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/1933
* Propagate max errorcode to deepspeed when using PDSH launcher by jerrymannil in https://github.com/microsoft/DeepSpeed/pull/1994
* [docs] add new build badges to landing page by jeffra in https://github.com/microsoft/DeepSpeed/pull/1998
* DeepSpeed Comm. Backend v1 by awan-10 in https://github.com/microsoft/DeepSpeed/pull/1985
* Relax DeepSpeed MoE ZeRO-1 Assertion by Quentin-Anthony in https://github.com/microsoft/DeepSpeed/pull/2007
* update CODEOWNERS by conglongli in https://github.com/microsoft/DeepSpeed/pull/2017
* [CI] force upgrade HF dependencies & output py env by jeffra in https://github.com/microsoft/DeepSpeed/pull/2015
* [inference] test suite for ds-kernels (bert, roberta, gpt2, gpt-neo, gpt-j) by jeffra in https://github.com/microsoft/DeepSpeed/pull/1992
* DeepSpeed examples refresh by jeffra in https://github.com/microsoft/DeepSpeed/pull/2021
* Fix transformer API for training-evaluation pipeline by RezaYazdaniAminabadi in https://github.com/microsoft/DeepSpeed/pull/2018
* DataLoader Length Fix by Sanger2000 in https://github.com/microsoft/DeepSpeed/pull/1718
* DeepSpeed Monitor Module (Master) by Quentin-Anthony in https://github.com/microsoft/DeepSpeed/pull/2013
* Use partition numel by tjruwase in https://github.com/microsoft/DeepSpeed/pull/2011
* fix import errors by KMFODA in https://github.com/microsoft/DeepSpeed/pull/2026
* Fix inference unit test import error catching by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/2024
* Retain available params until last use by tjruwase in https://github.com/microsoft/DeepSpeed/pull/2016
* Split parameter offload from z3 by tjruwase in https://github.com/microsoft/DeepSpeed/pull/2009
* Fix flops profiler print statements by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/2038
* Add compression papers by conglongli in https://github.com/microsoft/DeepSpeed/pull/2042
* Fix the half-precision version of CPU-Adam by RezaYazdaniAminabadi in https://github.com/microsoft/DeepSpeed/pull/2032
* Fix for AMD unit tests by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/2047
* Wrong partition_id while copying fp32_params -> fp16 params in Z2 for MoE by siddharth9820 in https://github.com/microsoft/DeepSpeed/pull/2058
* Fix missing import in replace_module.py by aphedges in https://github.com/microsoft/DeepSpeed/pull/2050
* Comms Benchmarks by Quentin-Anthony in https://github.com/microsoft/DeepSpeed/pull/2040
* add ds inference paper by jeffra in https://github.com/microsoft/DeepSpeed/pull/2072
* Comments for better understanding of zero stage1_2 by kisseternity in https://github.com/microsoft/DeepSpeed/pull/2027
* [docs] fix broken read-the-docs build by jeffra in https://github.com/microsoft/DeepSpeed/pull/2075
* Fix building package without a GPU by aphedges in https://github.com/microsoft/DeepSpeed/pull/2049
* Fix partition id in the fp32->fp16 param copying step for z2+cpu-offload by siddharth9820 in https://github.com/microsoft/DeepSpeed/pull/2059
* Codeowner addendum and fix to small model debugging script by samadejacobs in https://github.com/microsoft/DeepSpeed/pull/2076
* remove require grad in params count by cli99 in https://github.com/microsoft/DeepSpeed/pull/2065
* Add missing newline for ZeroOneAdam parameter table by manuelciosici in https://github.com/microsoft/DeepSpeed/pull/2088
* fixed "None type has no len()" by xiazeyu in https://github.com/microsoft/DeepSpeed/pull/2091
* Improving memory utilization of Z2+MoE by siddharth9820 in https://github.com/microsoft/DeepSpeed/pull/2079
New Contributors
* jerrymannil made their first contribution in https://github.com/microsoft/DeepSpeed/pull/1994
* Sanger2000 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/1718
* KMFODA made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2026
* siddharth9820 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2058
* samadejacobs made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2076
* xiazeyu made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2091
**Full Changelog**: https://github.com/microsoft/DeepSpeed/compare/v0.6.5...v0.6.6