What's Changed
* Update version.txt after 0.16.4 release by loadams in https://github.com/deepspeedai/DeepSpeed/pull/7063
* fix an outdated doc wrt CUDA_VISIBLE_DEVICES by stas00 in https://github.com/deepspeedai/DeepSpeed/pull/7058
* Tecorigin sdaa accelerator by siqi654321 in https://github.com/deepspeedai/DeepSpeed/pull/6903
* Handle special case of libuv for Windows by loadams in https://github.com/deepspeedai/DeepSpeed/pull/7064
* Bug Fix for offload_states API by U-rara in https://github.com/deepspeedai/DeepSpeed/pull/7050
* Update README with info on newest accelerator by loadams in https://github.com/deepspeedai/DeepSpeed/pull/7065
* Fix TOCTOU issues, switch to fstat by loadams in https://github.com/deepspeedai/DeepSpeed/pull/7067
* config torch to avoid graph breaks caused by logger by ShellyNR in https://github.com/deepspeedai/DeepSpeed/pull/6999
* Fix meta load tensor imcompatible issue by Yejing-Lai in https://github.com/deepspeedai/DeepSpeed/pull/7073
* Replace calls to `python setup.py sdist` with `python -m build --sdist` by loadams in https://github.com/deepspeedai/DeepSpeed/pull/7069
* Revert "Handle special case of libuv for Windows (7064)" by loadams in https://github.com/deepspeedai/DeepSpeed/pull/7076
* Add DeepseekV3 AutoTP. by Yejing-Lai in https://github.com/deepspeedai/DeepSpeed/pull/7045
* Improve inference tutorial docs by loadams in https://github.com/deepspeedai/DeepSpeed/pull/7083
* Pin transformers version on tests that use latest. by loadams in https://github.com/deepspeedai/DeepSpeed/pull/7085
* Update README.md with ICS '23 MoE paper link by siddharth9820 in https://github.com/deepspeedai/DeepSpeed/pull/7087
* Update parallelism for nv-torch-latest/nightly tests due to more GPUs/runner by loadams in https://github.com/deepspeedai/DeepSpeed/pull/7086
* Remove workflows for very old torch versions by loadams in https://github.com/deepspeedai/DeepSpeed/pull/7090
* Use new dlpack api; Formatting fixes by tjruwase in https://github.com/deepspeedai/DeepSpeed/pull/7101
* Avoid graph breaks by disabling sourceless calls in instrument_w_nvtx by deepcharm in https://github.com/deepspeedai/DeepSpeed/pull/7081
* Avoid graph breaks in torch.compile caused by inner classes in the backward hooks by deepcharm in https://github.com/deepspeedai/DeepSpeed/pull/7062
* Only run pre-commit on the changes by hwchen2017 in https://github.com/deepspeedai/DeepSpeed/pull/7106
* Avoid graph break due to unsupported frozenset by deepcharm in https://github.com/deepspeedai/DeepSpeed/pull/7105
* Fix fused_qkv print model ValueError by Yejing-Lai in https://github.com/deepspeedai/DeepSpeed/pull/7109
* Update references to new X/Twitter handle by loadams in https://github.com/deepspeedai/DeepSpeed/pull/7110
* Update gaudi2 nightly,ci to latest 1.20.0 build by raza-sikander in https://github.com/deepspeedai/DeepSpeed/pull/7093
* fix keep_module_on_host by inkcherry in https://github.com/deepspeedai/DeepSpeed/pull/7112
* Add sequential pytest mark to TestNVMeCheckpointing to resolve pytest forked hangs by loadams in https://github.com/deepspeedai/DeepSpeed/pull/7131
* Training multiple models by tjruwase in https://github.com/deepspeedai/DeepSpeed/pull/7018
* Update CONTRIBUTING.md to reflect changes from CLA to DCO by loadams in https://github.com/deepspeedai/DeepSpeed/pull/7135
* Avoid missing attr error by tjruwase in https://github.com/deepspeedai/DeepSpeed/pull/7133
* Add conditional expression by A-transformer in https://github.com/deepspeedai/DeepSpeed/pull/7119
* Unpin transformers version for most workflows by loadams in https://github.com/deepspeedai/DeepSpeed/pull/7139
* Conditionally quote env vars by saurabhkoshatwar in https://github.com/deepspeedai/DeepSpeed/pull/7071
* Correct the BACKWARD_PREFETCH_SUBMIT mismatch by A-transformer in https://github.com/deepspeedai/DeepSpeed/pull/7120
* Enhance Gaudi2 CI/Nightly Coverage with Model Parallelism and Linear Tests by raza-sikander in https://github.com/deepspeedai/DeepSpeed/pull/7146
* Update container version that runs on A6000 tests. by loadams in https://github.com/deepspeedai/DeepSpeed/pull/7153
* hf tp+zero training doc. by inkcherry in https://github.com/deepspeedai/DeepSpeed/pull/7151
* Avoid graph break by removing redundant requires_grad attr change by deepcharm in https://github.com/deepspeedai/DeepSpeed/pull/7158
* Add destroy to tests to free memory by tohtana in https://github.com/deepspeedai/DeepSpeed/pull/7160
* [NFC] Typo fix in SP layer. by c8ef in https://github.com/deepspeedai/DeepSpeed/pull/7152
* Link AutoTP blog in the front page by hwchen2017 in https://github.com/deepspeedai/DeepSpeed/pull/7167
* fix `seq_parallel_communication_data_type` constant. by stas00 in https://github.com/deepspeedai/DeepSpeed/pull/7175
* Fix typos in GDS blog by loadams in https://github.com/deepspeedai/DeepSpeed/pull/7177
* Variable batch size and LR scheduler by bm-synth in https://github.com/deepspeedai/DeepSpeed/pull/7104
New Contributors
* siqi654321 made their first contribution in https://github.com/deepspeedai/DeepSpeed/pull/6903
* A-transformer made their first contribution in https://github.com/deepspeedai/DeepSpeed/pull/7119
* saurabhkoshatwar made their first contribution in https://github.com/deepspeedai/DeepSpeed/pull/7071
* c8ef made their first contribution in https://github.com/deepspeedai/DeepSpeed/pull/7152
**Full Changelog**: https://github.com/deepspeedai/DeepSpeed/compare/v0.16.4...v0.16.5