What's Changed
* Update version.txt after 0.15.1 release by loadams in https://github.com/microsoft/DeepSpeed/pull/6493
* HPU: add required ENV vars to acccelerator init by nelyahu in https://github.com/microsoft/DeepSpeed/pull/6495
* Op_builder->is_compatible quite warning by terry-for-github in https://github.com/microsoft/DeepSpeed/pull/6093
* fix pipeline eval_batch micro_batches argument for schedule by nelyahu in https://github.com/microsoft/DeepSpeed/pull/6484
* Fix the broken url link by rogerxfeng8 in https://github.com/microsoft/DeepSpeed/pull/6500
* fix environment variable export bug for MultiNodeRunner by TideDra in https://github.com/microsoft/DeepSpeed/pull/5878
* Revert "BF16 optimizer: Clear lp grads after updating hp grads in hook" by nelyahu in https://github.com/microsoft/DeepSpeed/pull/6508
* wrap include cuda_bf16.h with ifdef BF16_AVAILABLE by oelayan7 in https://github.com/microsoft/DeepSpeed/pull/6520
* Avoid security issues of subprocess shell by tjruwase in https://github.com/microsoft/DeepSpeed/pull/6498
* Add conditional on torch version for scaled_dot_product_attention by loadams in https://github.com/microsoft/DeepSpeed/pull/6517
* Added Intel Gaudi to Accelerator Setup Guide by ShifaAbu in https://github.com/microsoft/DeepSpeed/pull/6543
* Skip failing newly added tests in accelerate by loadams in https://github.com/microsoft/DeepSpeed/pull/6574
* Use msgpack for p2p comm by tohtana in https://github.com/microsoft/DeepSpeed/pull/6547
* DeepNVMe perf tuning by tjruwase in https://github.com/microsoft/DeepSpeed/pull/6560
* [Accelerator] Cambricon MLU support by Andy666G in https://github.com/microsoft/DeepSpeed/pull/6472
* Fix gradient accumulation for Z2+offload by tohtana in https://github.com/microsoft/DeepSpeed/pull/6550
* fix errors when setting zero3 leaf modules with torch.compile by NirSonnenschein in https://github.com/microsoft/DeepSpeed/pull/6564
* [XPU] Support DeepNVMe new code structure by Liangliang-Ma in https://github.com/microsoft/DeepSpeed/pull/6532
* Add APIs to offload states of model, optimizer, and engine by tohtana in https://github.com/microsoft/DeepSpeed/pull/6011
* add bfloat16 to inference support dtypes by nelyahu in https://github.com/microsoft/DeepSpeed/pull/6528
* [COMPILE] workflow for deepspeed + torch.compile by YizhouZ in https://github.com/microsoft/DeepSpeed/pull/6570
* Fixes on the accelerate side mean we do not need to skip this test by loadams in https://github.com/microsoft/DeepSpeed/pull/6583
* Fix torch include in `op_builder/mlu/fused_adam.py` and update no-torch workflow triggers by loadams in https://github.com/microsoft/DeepSpeed/pull/6584
* [ROCm] Fix subprocess error by jagadish-amd in https://github.com/microsoft/DeepSpeed/pull/6587
* Cleanup CODEOWNERS file to be valid by loadams in https://github.com/microsoft/DeepSpeed/pull/6603
* Add SSF Best practices badge by loadams in https://github.com/microsoft/DeepSpeed/pull/6604
* Move V100 workflows from cuda 11.1/11.7 to 12.1 by loadams in https://github.com/microsoft/DeepSpeed/pull/6607
* Fix SD workflow by loadams in https://github.com/microsoft/DeepSpeed/pull/6609
* Pin accelerate to fix CI failures/issues by loadams in https://github.com/microsoft/DeepSpeed/pull/6610
* Add llama3.2 vision autotp by Yejing-Lai in https://github.com/microsoft/DeepSpeed/pull/6577
* Improve DS logging control by tjruwase in https://github.com/microsoft/DeepSpeed/pull/6602
* Fix device selection using CUDA_VISIBLE_DEVICES by tohtana in https://github.com/microsoft/DeepSpeed/pull/6530
* Handle when `backend` is also in compile_kwargs by oraluben in https://github.com/microsoft/DeepSpeed/pull/6502
* Rearrange inference OPS and stop using builder.load by oelayan7 in https://github.com/microsoft/DeepSpeed/pull/5490
* Unpin accelerate tests, update lightning with node16 removal. by loadams in https://github.com/microsoft/DeepSpeed/pull/6611
* Enabled Qwen2-MoE Tensor Parallelism (TP) inference by gyou2021 in https://github.com/microsoft/DeepSpeed/pull/6551
New Contributors
* TideDra made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5878
* ShifaAbu made their first contribution in https://github.com/microsoft/DeepSpeed/pull/6543
* jagadish-amd made their first contribution in https://github.com/microsoft/DeepSpeed/pull/6587
* gyou2021 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/6551
**Full Changelog**: https://github.com/microsoft/DeepSpeed/compare/v0.15.1...v0.15.2