Deepspeed

Latest version: v0.14.2

Safety actively analyzes 623518 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 15

0.14.2

What's Changed
* Update version.txt after 0.14.1 release by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/5413
* Remove dtype(fp16) condition check for residual_add unit test by raza-sikander in https://github.com/microsoft/DeepSpeed/pull/5329
* [XPU] Use non_daemonic_proc by default on XPU device by ys950902 in https://github.com/microsoft/DeepSpeed/pull/5412
* Fix a convergence issues in TP topology caused by incorrect grad_norm. by inkcherry in https://github.com/microsoft/DeepSpeed/pull/5411
* Update 'create-pr' action in release workflow to latest by loadams in https://github.com/microsoft/DeepSpeed/pull/5415
* Update engine.py to avoid torch warning by etiennebonnafoux in https://github.com/microsoft/DeepSpeed/pull/5408
* Update _sidebar.scss by fasterinnerlooper in https://github.com/microsoft/DeepSpeed/pull/5293
* Add more tests into XPU CI by Liangliang-Ma in https://github.com/microsoft/DeepSpeed/pull/5427
* [CPU] Support SHM based inference_all_reduce in TorchBackend by delock in https://github.com/microsoft/DeepSpeed/pull/5391
* Add required paths to trigger AMD tests on PRs by loadams in https://github.com/microsoft/DeepSpeed/pull/5406
* Bug fix in `split_index` method by bm-synth in https://github.com/microsoft/DeepSpeed/pull/5292
* Parallel map step for `DistributedDataAnalyzer` map-reduce by bm-synth in https://github.com/microsoft/DeepSpeed/pull/5291
* Selective dequantization by RezaYazdaniAminabadi in https://github.com/microsoft/DeepSpeed/pull/5375
* Fix sorting of shard optimizer states files for universal checkpoint by tohtana in https://github.com/microsoft/DeepSpeed/pull/5395
* add device config env for the accelerator by shiyuan680 in https://github.com/microsoft/DeepSpeed/pull/5396
* 64bit indexing fused adam by garrett4wade in https://github.com/microsoft/DeepSpeed/pull/5187
* Improve parallel process of universal checkpoint conversion by tohtana in https://github.com/microsoft/DeepSpeed/pull/5343
* set the default to use set_to_none for clearing gradients in BF16 optimizer. by inkcherry in https://github.com/microsoft/DeepSpeed/pull/5434
* OptimizedLinear implementation by jeffra in https://github.com/microsoft/DeepSpeed/pull/5355
* Update README.md by Jhonso7393 in https://github.com/microsoft/DeepSpeed/pull/5453
* Update PyTest torch version to match PyTorch latest official (2.3.0) by loadams in https://github.com/microsoft/DeepSpeed/pull/5454

New Contributors
* etiennebonnafoux made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5408
* fasterinnerlooper made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5293
* shiyuan680 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5396
* garrett4wade made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5187
* Jhonso7393 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5453

**Full Changelog**: https://github.com/microsoft/DeepSpeed/compare/v0.14.1...v0.14.2

0.14.1

What's Changed
* Update version.txt after 0.14.0 release by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/5238
* Fp6 blog chinese by xiaoxiawu-microsoft in https://github.com/microsoft/DeepSpeed/pull/5239
* Add contributed HW support into README by delock in https://github.com/microsoft/DeepSpeed/pull/5240
* Set tp world size to 1 in ckpt load, if MPU is not provided by samadejacobs in https://github.com/microsoft/DeepSpeed/pull/5243
* Make op builder detection adapt to accelerator change by delock in https://github.com/microsoft/DeepSpeed/pull/5206
* Replace HIP_PLATFORM_HCC with HIP_PLATFORM_AMD by rraminen in https://github.com/microsoft/DeepSpeed/pull/5264
* Add CI for Habana Labs HPU/Gaudi2 by loadams in https://github.com/microsoft/DeepSpeed/pull/5244
* Fix attention mask handling in the Hybrid Engine Bloom flow by deepcharm in https://github.com/microsoft/DeepSpeed/pull/5101
* Skip 1Bit Compression and sparsegrad tests for HPU. by vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5270
* Enabled LMCorrectness inference tests on HPU. by vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5271
* Added HPU backend support for torch.compile tests. by vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5269
* Average only valid part of the ipg buffer. by BacharL in https://github.com/microsoft/DeepSpeed/pull/5268
* Add HPU accelerator support in unit tests. by vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5162
* Fix loading a universal checkpoint by tohtana in https://github.com/microsoft/DeepSpeed/pull/5263
* Add Habana Gaudi2 CI badge to the README by loadams in https://github.com/microsoft/DeepSpeed/pull/5286
* Add intel gaudi to contributed HW in README by BacharL in https://github.com/microsoft/DeepSpeed/pull/5300
* Fixed Accelerate Link by wkaisertexas in https://github.com/microsoft/DeepSpeed/pull/5314
* Enable mixtral 8x7b autotp by Yejing-Lai in https://github.com/microsoft/DeepSpeed/pull/5257
* support bf16_optimizer moe expert parallel training and moe EP grad_scale/grad_norm fix by inkcherry in https://github.com/microsoft/DeepSpeed/pull/5259
* fix comms dtype by mayank31398 in https://github.com/microsoft/DeepSpeed/pull/5297
* Modified regular expression by igeni in https://github.com/microsoft/DeepSpeed/pull/5306
* Docs typos fix and grammar suggestions by Gr0g0 in https://github.com/microsoft/DeepSpeed/pull/5322
* Added Gaudi2 CI tests. by vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5275
* Improve universal checkpoint by tohtana in https://github.com/microsoft/DeepSpeed/pull/5289
* Increase coverage for HPU by loadams in https://github.com/microsoft/DeepSpeed/pull/5324
* Add NFS path check for default deepspeed triton cache directory by HeyangQin in https://github.com/microsoft/DeepSpeed/pull/5323
* Correct typo in checking on bf16 unit test support by loadams in https://github.com/microsoft/DeepSpeed/pull/5317
* Make NFS warning print only once by HeyangQin in https://github.com/microsoft/DeepSpeed/pull/5345
* resolve KeyError: 'PDSH_SSH_ARGS_APPEND' by Lzhang-hub in https://github.com/microsoft/DeepSpeed/pull/5318
* BF16 optimizer: Clear lp grads after updating hp grads in hook by YangQun1 in https://github.com/microsoft/DeepSpeed/pull/5328
* Fix sort of zero checkpoint files by tohtana in https://github.com/microsoft/DeepSpeed/pull/5342
* Add `distributed_port` for `deepspeed.initialize` by LZHgrla in https://github.com/microsoft/DeepSpeed/pull/5260
* [fix] fix typo s/simultanenously /simultaneously by digger-yu in https://github.com/microsoft/DeepSpeed/pull/5359
* Update container version for Gaudi2 CI by raza-sikander in https://github.com/microsoft/DeepSpeed/pull/5360
* compute global norm on device by BacharL in https://github.com/microsoft/DeepSpeed/pull/5125
* logger update with torch master changes by rogerxfeng8 in https://github.com/microsoft/DeepSpeed/pull/5346
* Ensure capacity does not exceed number of tokens by jeffra in https://github.com/microsoft/DeepSpeed/pull/5353
* Update workflows that use cu116 to cu117 by loadams in https://github.com/microsoft/DeepSpeed/pull/5361
* FP [6,8,12] quantizer op by jeffra in https://github.com/microsoft/DeepSpeed/pull/5336
* CPU SHM based inference_all_reduce improve by delock in https://github.com/microsoft/DeepSpeed/pull/5320
* Auto convert moe param groups by jeffra in https://github.com/microsoft/DeepSpeed/pull/5354
* Support MoE for pipeline models by mosheisland in https://github.com/microsoft/DeepSpeed/pull/5338
* Update pytest and transformers with fixes for pytest>= 8.0.0 by loadams in https://github.com/microsoft/DeepSpeed/pull/5164
* Increase CI coverage for Gaudi2 accelerator. by vshekhawat-hlab in https://github.com/microsoft/DeepSpeed/pull/5358
* Add CI for Intel XPU/Max1100 by Liangliang-Ma in https://github.com/microsoft/DeepSpeed/pull/5376
* Update path name on xpu-max1100.yml, add badge in README by loadams in https://github.com/microsoft/DeepSpeed/pull/5386
* Update checkout action on workflows on ubuntu 20.04 by loadams in https://github.com/microsoft/DeepSpeed/pull/5387
* Cleanup required_torch_version code and references. by loadams in https://github.com/microsoft/DeepSpeed/pull/5370
* Update README.md for intel XPU support by Liangliang-Ma in https://github.com/microsoft/DeepSpeed/pull/5389
* Optimize the fp-dequantizer to get high memory-BW utilization by RezaYazdaniAminabadi in https://github.com/microsoft/DeepSpeed/pull/5373
* Removal of cuda hardcoded string with get_device function by raza-sikander in https://github.com/microsoft/DeepSpeed/pull/5351
* Add custom reshaping for universal checkpoint by tohtana in https://github.com/microsoft/DeepSpeed/pull/5390
* fix pagable h2d memcpy by GuanhuaWang in https://github.com/microsoft/DeepSpeed/pull/5301
* stage3: efficient compute of scaled_global_grad_norm by nelyahu in https://github.com/microsoft/DeepSpeed/pull/5256
* Fix the FP6 kernels compilation problem on non-Ampere GPUs. by JamesTheZ in https://github.com/microsoft/DeepSpeed/pull/5333

New Contributors
* vshekhawat-hlab made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5270
* wkaisertexas made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5314
* igeni made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5306
* Gr0g0 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5322
* Lzhang-hub made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5318
* YangQun1 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5328
* raza-sikander made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5360
* rogerxfeng8 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5346
* JamesTheZ made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5333

**Full Changelog**: https://github.com/microsoft/DeepSpeed/compare/v0.14.0...v0.14.1

0.14.0

New Features
* [DeepSpeed-FP6: The Power of FP6-Centric Serving for Large Language Models.](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fp6/03-05-2024)

What's Changed
* Update version.txt after 0.13.5 release by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/5229
* MOE gate fixes and enhancements by mosheisland in https://github.com/microsoft/DeepSpeed/pull/5156
* FP6 quantization end-to-end. by loadams in https://github.com/microsoft/DeepSpeed/pull/5234
* FP6 blog by loadams in https://github.com/microsoft/DeepSpeed/pull/5235

**Full Changelog**: https://github.com/microsoft/DeepSpeed/compare/v0.13.5...v0.14.0

0.13.5

What's Changed
* Update version.txt after 0.13.4 release by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/5196
* Fix assertion to run pipeline engine with a compiled module by tohtana in https://github.com/microsoft/DeepSpeed/pull/5197
* Allow specifying MII branch on MII CI by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/5208
* [zero++] Synchronize at the end of secondary partitioning and simplify the logic by ByronHsu in https://github.com/microsoft/DeepSpeed/pull/5216
* Add fp16 support of Qwen1.5 models (0.5B to 72B) to DeepSpeed-FastGen by ZonePG in https://github.com/microsoft/DeepSpeed/pull/5219
* Rename nv-torch-latest-cpu workflow to cpu-torch-latest by loadams in https://github.com/microsoft/DeepSpeed/pull/5226
* Fix moe cpu offload by RezaYazdaniAminabadi in https://github.com/microsoft/DeepSpeed/pull/5220
* Use `deepspeed.comm` instead of `torch.distributed` by jinyouzhi in https://github.com/microsoft/DeepSpeed/pull/5225
* fix fused_qkv model accuracy issue by Yejing-Lai in https://github.com/microsoft/DeepSpeed/pull/5217

**Full Changelog**: https://github.com/microsoft/DeepSpeed/compare/v0.13.4...v0.13.5

0.13.4

What's Changed
* Update version.txt after v0.13.3 release by loadams in https://github.com/microsoft/DeepSpeed/pull/5185
* Fixes for `--extra-index-url` by loadams in https://github.com/microsoft/DeepSpeed/pull/5183
* allow debug/experimental compiler backends by tohtana in https://github.com/microsoft/DeepSpeed/pull/5191
* Disable ninja by default by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/5194
* [CPUAdam] Update full_precision_optimizer_states in docstring by rohan-varma in https://github.com/microsoft/DeepSpeed/pull/5181
* Add script to check for `--extra-index-url` by loadams in https://github.com/microsoft/DeepSpeed/pull/5184

New Contributors
* rohan-varma made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5181

**Full Changelog**: https://github.com/microsoft/DeepSpeed/compare/v0.13.3...v0.13.4

0.13.3

What's Changed
* Update version.txt after 0.13.2 release by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/5119
* Stop tracking backward chain of broadcast (ZeRO3) by tohtana in https://github.com/microsoft/DeepSpeed/pull/5113
* [NPU]ZeRO-Infinity feature compatibility by misstek in https://github.com/microsoft/DeepSpeed/pull/5077
* BF16 optimizer: Improve device utilization by immediate grad update by deepcharm in https://github.com/microsoft/DeepSpeed/pull/4975
* removed if condition in `if collate_fn is None` by bm-synth in https://github.com/microsoft/DeepSpeed/pull/5107
* disable compile tests for torch<2.1 by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/5121
* Update inference test model names by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/5127
* Fix issue with zero-sized file after merging file on curriculum `map_reduce` by bm-synth in https://github.com/microsoft/DeepSpeed/pull/5106
* Update return codes in PyTest to properly error out if tests fail by loadams in https://github.com/microsoft/DeepSpeed/pull/5122
* add missing methods to MPS_Accelerator by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/5134
* Solve tensor vs numpy dtype conflicts in data efficiency map-reduce. by bm-synth in https://github.com/microsoft/DeepSpeed/pull/5108
* Fix broadcast deadlock for incomplete batches in data sample for data analysis by bm-synth in https://github.com/microsoft/DeepSpeed/pull/5117
* Avoid zero-sized microbatches for incomplete minibatches when doing curriculum learning by bm-synth in https://github.com/microsoft/DeepSpeed/pull/5118
* remove mandatory `index` key from output of `metric_function` in `DataAnalysis` map operation by bm-synth in https://github.com/microsoft/DeepSpeed/pull/5112
* tensorboard logging: avoid item() outside gas to improve performance by nelyahu in https://github.com/microsoft/DeepSpeed/pull/5135
* Check overflow on device without host synchronization for each tensor by BacharL in https://github.com/microsoft/DeepSpeed/pull/5115
* Update nv-inference torch version by loadams in https://github.com/microsoft/DeepSpeed/pull/5128
* Method `run_map_reduce` to fix errors when running `run_map` followed by `run_reduce` by bm-synth in https://github.com/microsoft/DeepSpeed/pull/5131
* Added missing `isinstance` check in PR 5112 by bm-synth in https://github.com/microsoft/DeepSpeed/pull/5142
* Fix UserWarning: The torch.cuda.*DtypeTensor constructors are no long… by ShukantPal in https://github.com/microsoft/DeepSpeed/pull/5018
* TestEmptyParameterGroup: replace fusedAdam with torch.optim.AdamW by nelyahu in https://github.com/microsoft/DeepSpeed/pull/5139
* Update deprecated HuggingFace function by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/5144
* Pin to PyTest 8.0.0 by loadams in https://github.com/microsoft/DeepSpeed/pull/5163
* get_grad_norm_direct: fix a case of empty norm group by nelyahu in https://github.com/microsoft/DeepSpeed/pull/5148
* Distributed in-memory map-reduce for data analyzer by bm-synth in https://github.com/microsoft/DeepSpeed/pull/5129
* DeepSpeedZeroOptimizer_Stage3: remove cuda specific optimizer by nelyahu in https://github.com/microsoft/DeepSpeed/pull/5138
* MOE: Fix save checkpoint when TP > 1 by mosheisland in https://github.com/microsoft/DeepSpeed/pull/5157
* Fix gradient clipping by tohtana in https://github.com/microsoft/DeepSpeed/pull/5150
* Use ninja to speed up build by jinzhen-lin in https://github.com/microsoft/DeepSpeed/pull/5088
* Update flops profiler to handle attn and __matmul__ by KimmiShi in https://github.com/microsoft/DeepSpeed/pull/4724
* Fix allreduce for BF16 and ZeRO0 by tohtana in https://github.com/microsoft/DeepSpeed/pull/5170
* Write multiple items to output file at once, in distributed data analyzer. by bm-synth in https://github.com/microsoft/DeepSpeed/pull/5169
* Fix typos in blogs/ by jinyouzhi in https://github.com/microsoft/DeepSpeed/pull/5172
* Inference V2 Human Eval by lekurile in https://github.com/microsoft/DeepSpeed/pull/4804
* Reduce ds_id name length by jomayeri in https://github.com/microsoft/DeepSpeed/pull/5176
* Switch cpu-inference workflow from --extra-index-url to --index-url by loadams in https://github.com/microsoft/DeepSpeed/pull/5182

New Contributors
* bm-synth made their first contribution in https://github.com/microsoft/DeepSpeed/pull/5107
* KimmiShi made their first contribution in https://github.com/microsoft/DeepSpeed/pull/4724

**Full Changelog**: https://github.com/microsoft/DeepSpeed/compare/v0.13.2...v0.13.3

Page 1 of 15

Releases

Has known vulnerabilities

Deepspeed

Page 1 of 15

0.14.2

0.14.1

0.14.0

0.13.5

0.13.4

0.13.3

Page 1 of 15

Links

Releases