Deepspeed

Latest version: v0.16.2

Safety actively analyzes 693883 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 6 of 17

0.10.3

Not secure
New Features
* [ZeRO-Inference: 20X faster inference through weight quantization and KV cache offloading](https://github.com/microsoft/DeepSpeedExamples/tree/master/inference/huggingface/zero_inference)

What's Changed
* Add Mixed Precision ZeRO++ tutorial by HeyangQin in https://github.com/microsoft/DeepSpeed/pull/4241
* DeepSpeed-Chat Llama2/stability release by awan-10 in https://github.com/microsoft/DeepSpeed/pull/4240
* Update README.md by awan-10 in https://github.com/microsoft/DeepSpeed/pull/4244
* Pin Triton version to >=2.0.0 and <2.1.0 by lekurile in https://github.com/microsoft/DeepSpeed/pull/4251
* Allow modification of zero partitioned parameters by tjruwase in https://github.com/microsoft/DeepSpeed/pull/4192
* Checks for user injection policy by satpalsr in https://github.com/microsoft/DeepSpeed/pull/3052
* Add check that opening issues on CI failure requires schedule by loadams in https://github.com/microsoft/DeepSpeed/pull/4242
* Code Refactoring by tosemml in https://github.com/microsoft/DeepSpeed/pull/4262
* tolerating missing optimizer states for MoE [2nd attempt] by clumsy in https://github.com/microsoft/DeepSpeed/pull/4120
* Fix nv-inference/un-pin transformers by loadams in https://github.com/microsoft/DeepSpeed/pull/4269
* check for zero (empty) param groups in llama + hf/accelerate. by awan-10 in https://github.com/microsoft/DeepSpeed/pull/4270
* use non_reentrant_checkpoint fix requires_grad of input must be true for activation checkpoint layer in pipeline train. by inkcherry in https://github.com/microsoft/DeepSpeed/pull/4224
* The PostBackwardFunction class should be more clearly named to distinguish it from the PreBackwardFunction class. by Crispig in https://github.com/microsoft/DeepSpeed/pull/2548
* fix iteration timing used in autotuning when gradient_accumulation_steps > 1 by cli99 in https://github.com/microsoft/DeepSpeed/pull/2888
* Update README.md by NinoRisteski in https://github.com/microsoft/DeepSpeed/pull/4284
* update deepspeed to run with the most recent triton 2.1.0 by stephen-youn in https://github.com/microsoft/DeepSpeed/pull/4278
* Keep hpz secondary tensor in forward pass by HeyangQin in https://github.com/microsoft/DeepSpeed/pull/4288
* Support iterators with incompletely defined __len__ functions by codedecde in https://github.com/microsoft/DeepSpeed/pull/2445
* AMD Kernel Compatibility Fixes by cmikeh2 in https://github.com/microsoft/DeepSpeed/pull/3180
* ZeRO-Inference refresh by tjruwase in https://github.com/microsoft/DeepSpeed/pull/4197
* fix user args parsing of string with spaces on runner by YudiZh in https://github.com/microsoft/DeepSpeed/pull/4265
* Update index.md by NinoRisteski in https://github.com/microsoft/DeepSpeed/pull/4297

New Contributors
* tosemml made their first contribution in https://github.com/microsoft/DeepSpeed/pull/4262
* Crispig made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2548
* NinoRisteski made their first contribution in https://github.com/microsoft/DeepSpeed/pull/4284
* codedecde made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2445
* YudiZh made their first contribution in https://github.com/microsoft/DeepSpeed/pull/4265

**Full Changelog**: https://github.com/microsoft/DeepSpeed/compare/v0.10.2...v0.10.3

0.10.2

Not secure
What's Changed
* MP ZeRO++ by HeyangQin in https://github.com/microsoft/DeepSpeed/pull/3954
* do allgather only in shared optimizer states groups by inkcherry in https://github.com/microsoft/DeepSpeed/pull/4167
* Permit empty environment variables as unset in `setup.py` by loadams in https://github.com/microsoft/DeepSpeed/pull/4185
* enable autoTP for mpt in huggingface model hub without trust_remote_c… by sywangyi in https://github.com/microsoft/DeepSpeed/pull/4062
* Fix nv-nightly workflow by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/4163
* Fix the path in tutorial by kytimmylai in https://github.com/microsoft/DeepSpeed/pull/4193
* Add unit test to check HF low_cpu_mem_usage_flag by loadams in https://github.com/microsoft/DeepSpeed/pull/4184
* Fix ZeRO parameter initialization for tensors with `requires_grad=True` by XuehaiPan in https://github.com/microsoft/DeepSpeed/pull/4138
* DeepSpeed Ulysses tutorial by minjiaz in https://github.com/microsoft/DeepSpeed/pull/4200
* Load z3 checkpoints for inference by tjruwase in https://github.com/microsoft/DeepSpeed/pull/4171
* DeepSpeed Ulysses release by samadejacobs in https://github.com/microsoft/DeepSpeed/pull/4198
* Deepspeed-Ulysses blog by samadejacobs in https://github.com/microsoft/DeepSpeed/pull/4201
* Ds ulysses news by samadejacobs in https://github.com/microsoft/DeepSpeed/pull/4202
* DS-Ulysses formating by samadejacobs in https://github.com/microsoft/DeepSpeed/pull/4204
* Update Ulyssess by samadejacobs in https://github.com/microsoft/DeepSpeed/pull/4205
* Update README.md by samadejacobs in https://github.com/microsoft/DeepSpeed/pull/4211
* Add Japanese blog of DS-Ulysses by tohtana in https://github.com/microsoft/DeepSpeed/pull/4209
* DeepSpeed Ulysses Chinese blog translation by HeyangQin in https://github.com/microsoft/DeepSpeed/pull/4210
* add ulysses blog index by conglongli in https://github.com/microsoft/DeepSpeed/pull/4215
* Add MuP optimizers by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/2043
* Simplify Gradient Attribute Names by jomayeri in https://github.com/microsoft/DeepSpeed/pull/4214
* add meta onDevice support for LLAMA2 by dc3671 in https://github.com/microsoft/DeepSpeed/pull/4147
* Fixes timer error referenced in 4212 by bjoernpl in https://github.com/microsoft/DeepSpeed/pull/4213
* Fix pipline dataloader when batch elements contain tuple by ghosthamlet in https://github.com/microsoft/DeepSpeed/pull/565
* feat(activation_checkpointing): add `non_reentrant_checkpoint` to support inputs require no grad by hughpu in https://github.com/microsoft/DeepSpeed/pull/4118
* add npu support dtypes by CurryRice233 in https://github.com/microsoft/DeepSpeed/pull/4223
* Fix fused qkv sizing for bloom by molly-smith in https://github.com/microsoft/DeepSpeed/pull/4161
* added port argument for ssh by Hiromasa-H in https://github.com/microsoft/DeepSpeed/pull/4117
* Empty tensor size check by jomayeri in https://github.com/microsoft/DeepSpeed/pull/4186
* fix: linker issues in conda environments 3929 by maximegmd in https://github.com/microsoft/DeepSpeed/pull/4235
* Enable AMD MI200 and H100 to run on branches for testing by loadams in https://github.com/microsoft/DeepSpeed/pull/4238
* fix MegatronLayerPolicy to be compatible with the newest ParallelTransformerLayer by dc3671 in https://github.com/microsoft/DeepSpeed/pull/4236
* Enable hpz when running with torch.no_grad by HeyangQin in https://github.com/microsoft/DeepSpeed/pull/4232

New Contributors
* kytimmylai made their first contribution in https://github.com/microsoft/DeepSpeed/pull/4193
* bjoernpl made their first contribution in https://github.com/microsoft/DeepSpeed/pull/4213
* Hiromasa-H made their first contribution in https://github.com/microsoft/DeepSpeed/pull/4117
* maximegmd made their first contribution in https://github.com/microsoft/DeepSpeed/pull/4235

**Full Changelog**: https://github.com/microsoft/DeepSpeed/compare/v0.10.1...v0.10.2

0.10.1

Not secure
What's Changed
* [docs] add zero++ paper link by jeffra in https://github.com/microsoft/DeepSpeed/pull/3974
* Avoid race condition with port selection in unit tests by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/3975
* Remove duplicated inference unit tests by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/3951
* Switch to torch.linalg.norm by loadams in https://github.com/microsoft/DeepSpeed/pull/3984
* Simplify chain comparisons, remove redundant parentheses by digger-yu in https://github.com/microsoft/DeepSpeed/pull/3912
* [CPU] Support HBM flatmode and fakenuma mode by delock in https://github.com/microsoft/DeepSpeed/pull/3918
* Fix checkpoint conversion when model layers share weights by awaelchli in https://github.com/microsoft/DeepSpeed/pull/3825
* fixing flops profiler formatting, units and precision by clumsy in https://github.com/microsoft/DeepSpeed/pull/3927
* Specify language=python in pre-commit hook by wangruohui in https://github.com/microsoft/DeepSpeed/pull/3994
* [CPU] Skip CPU support unimplemented error by Yejing-Lai in https://github.com/microsoft/DeepSpeed/pull/3633
* ZeRO Gradient Accumulation Dtype. by jomayeri in https://github.com/microsoft/DeepSpeed/pull/2847
* [CPU] Use allreduce_low_latency for AutoTP and implement low latency allreduce for CPU backend (single node) by delock in https://github.com/microsoft/DeepSpeed/pull/3919
* Re-enable skipped unit tests by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/3939
* Make AMD/ROCm apex install to /blob to save test/compile time. by loadams in https://github.com/microsoft/DeepSpeed/pull/3997
* Option to exclude frozen weights for checkpoint save by tjruwase in https://github.com/microsoft/DeepSpeed/pull/3953
* Allow user to select name of .deepspeed_env by loadams in https://github.com/microsoft/DeepSpeed/pull/4006
* Silence backend warning by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/4009
* Fix user arg parsing in single node deployment by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/4007
* Specify triton 2.0.0 requirement by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/4008
* Re-enable elastic training for torch 2+ by loadams in https://github.com/microsoft/DeepSpeed/pull/4010
* add /dev/shm size to ds_report by jeffra in https://github.com/microsoft/DeepSpeed/pull/4015
* Make Ascend NPU available by hipudding in https://github.com/microsoft/DeepSpeed/pull/3831
* RNNprofiler: fix gates size retrieval logic in _rnn_flops by pinstripe-potoroo in https://github.com/microsoft/DeepSpeed/pull/3921
* fix typo in SECURITY.md by jstan327 in https://github.com/microsoft/DeepSpeed/pull/4019
* add llama2 autoTP support in replace_module by dc3671 in https://github.com/microsoft/DeepSpeed/pull/4022
* [zero_to_fp32] 3x less cpu memory requirements by stas00 in https://github.com/microsoft/DeepSpeed/pull/4025
* [CPU] FusedAdam and CPU training support by delock in https://github.com/microsoft/DeepSpeed/pull/3991
* remove duplicate check for pp and zero stage by inkcherry in https://github.com/microsoft/DeepSpeed/pull/4033
* Pass missing positional arguments in `DeepSpeedHybridEngine.generate()` by XuehaiPan in https://github.com/microsoft/DeepSpeed/pull/4026
* Remove print of weight parameter in RMS norm by puneeshkhanna in https://github.com/microsoft/DeepSpeed/pull/4031
* Monitored Loss Calculations by jomayeri in https://github.com/microsoft/DeepSpeed/pull/4030
* fix(pipe): make pipe module `load_state_dir` non-strict-mode work by hughpu in https://github.com/microsoft/DeepSpeed/pull/4020
* polishing timers and log_dist by clumsy in https://github.com/microsoft/DeepSpeed/pull/3996
* Engine side fix for loading llama checkpoint fine-tuned with zero3 by minjiaz in https://github.com/microsoft/DeepSpeed/pull/3981
* fix: Remove duplicate word the by digger-yu in https://github.com/microsoft/DeepSpeed/pull/4051
* [Bug Fix] Fix comm logging for inference by delock in https://github.com/microsoft/DeepSpeed/pull/4043
* fix opt-350m shard loading issue in AutoTP by sywangyi in https://github.com/microsoft/DeepSpeed/pull/3600
* enable autoTP for MPT by sywangyi in https://github.com/microsoft/DeepSpeed/pull/3861
* autoTP for fused qkv weight by inkcherry in https://github.com/microsoft/DeepSpeed/pull/3844
* [CPU] Faster reduce kernel for SHM allreduce by delock in https://github.com/microsoft/DeepSpeed/pull/4049
* Multiple zero stage 3 related fixes by tjruwase in https://github.com/microsoft/DeepSpeed/pull/3886
* Fix deadlock when SHM based allreduce spin too fast by delock in https://github.com/microsoft/DeepSpeed/pull/4048
* [MiCS] [Bugfix] set self.save_non_zero_checkpoint=True only for first partition group by zarzen in https://github.com/microsoft/DeepSpeed/pull/3787
* add reproducible compilation environment by fecet in https://github.com/microsoft/DeepSpeed/pull/3943
* fix: remove unnessary `` punct in the second `sed` command by hughpu in https://github.com/microsoft/DeepSpeed/pull/4061
* Refactor autoTP inference for HE by molly-smith in https://github.com/microsoft/DeepSpeed/pull/4040
* Fix transformers unit tests by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/4079
* Fix Stable Diffusion Injection by lekurile in https://github.com/microsoft/DeepSpeed/pull/4078
* Spread layers more uniformly when using partition_uniform by marcobellagente93 in https://github.com/microsoft/DeepSpeed/pull/4053
* fix typo: change polciies to policies by digger-yu in https://github.com/microsoft/DeepSpeed/pull/4090
* update ut/doc for glm/codegen by inkcherry in https://github.com/microsoft/DeepSpeed/pull/4057
* zero_to_fp32 script adds support for tag argument by EeyoreLee in https://github.com/microsoft/DeepSpeed/pull/4089
* add type checker ignore by EeyoreLee in https://github.com/microsoft/DeepSpeed/pull/4102
* Fix generate config validation error on inference unit tests by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/4107
* use correct ckpt path when base_dir not available by polisettyvarma in https://github.com/microsoft/DeepSpeed/pull/4101
* Disable z3 tracing profiler by tjruwase in https://github.com/microsoft/DeepSpeed/pull/4106
* Pass correct node size for ZeRO++ by cmikeh2 in https://github.com/microsoft/DeepSpeed/pull/4085
* add deepspeed chat arxiv report by conglongli in https://github.com/microsoft/DeepSpeed/pull/4110
* enable pipeline checkpoint loading mode by leiwen83 in https://github.com/microsoft/DeepSpeed/pull/3629
* Fix Issue 4083 by jomayeri in https://github.com/microsoft/DeepSpeed/pull/4084
* Add full list of DS_BUILD_* by loadams in https://github.com/microsoft/DeepSpeed/pull/4119
* Update nightly workflows to open an issue if CI fails by loadams in https://github.com/microsoft/DeepSpeed/pull/3952
* Update torch1.9 tests to 1.10 to match latest accelerate. by loadams in https://github.com/microsoft/DeepSpeed/pull/4126
* Handle PermissionError in os.chmod Call - Update engine.py by M-Chris in https://github.com/microsoft/DeepSpeed/pull/4139
* Generalize frozen weights unit test by tjruwase in https://github.com/microsoft/DeepSpeed/pull/4140
* Respect memory pinning config by tjruwase in https://github.com/microsoft/DeepSpeed/pull/4131
* Remove incorrect async-io library checking code. by loadams in https://github.com/microsoft/DeepSpeed/pull/4150
* Return nn.parameter type for weights and biases by molly-smith in https://github.com/microsoft/DeepSpeed/pull/4146
* Fixes 4151 by saforem2 in https://github.com/microsoft/DeepSpeed/pull/4152
* Handling for SIGTERM as well by loadams in https://github.com/microsoft/DeepSpeed/pull/4160
* Fix CI Badges by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/4162
* Add DS-Chat CI workflow by lekurile in https://github.com/microsoft/DeepSpeed/pull/4127
* [CPU][Bugfix] Make uid and addr_port part of SHM name in CCL backend by delock in https://github.com/microsoft/DeepSpeed/pull/4115
* Add DSE branch input to nv-ds-chat by lekurile in https://github.com/microsoft/DeepSpeed/pull/4173
* Pin transformers by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/4174

New Contributors
* awaelchli made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3825
* wangruohui made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3994
* jstan327 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/4019
* XuehaiPan made their first contribution in https://github.com/microsoft/DeepSpeed/pull/4026
* puneeshkhanna made their first contribution in https://github.com/microsoft/DeepSpeed/pull/4031
* hughpu made their first contribution in https://github.com/microsoft/DeepSpeed/pull/4020
* fecet made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3943
* marcobellagente93 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/4053
* polisettyvarma made their first contribution in https://github.com/microsoft/DeepSpeed/pull/4101
* leiwen83 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3629
* M-Chris made their first contribution in https://github.com/microsoft/DeepSpeed/pull/4139

**Full Changelog**: https://github.com/microsoft/DeepSpeed/compare/v0.10.0...v0.10.1

0.10.0

Not secure
New features
* [ZeRO++: A leap in speed for LLM and chat model training with 4X less communication](https://www.microsoft.com/en-us/research/blog/deepspeed-zero-a-leap-in-speed-for-llm-and-chat-model-training-with-4x-less-communication/)[[English](https://www.microsoft.com/en-us/research/blog/deepspeed-zero-a-leap-in-speed-for-llm-and-chat-model-training-with-4x-less-communication/)] [[中文](https://github.com/microsoft/DeepSpeed/blob/master/blogs/zeropp/chinese/README.md)] [[日本語](https://github.com/microsoft/DeepSpeed/blob/master/blogs/zeropp/japanese/README.md)]
* H100 support and testing w. FP8 using [NVIDIA's TransformerEngine](https://github.com/NVIDIA/TransformerEngine)

What's Changed
* Documentation for DeepSpeed Accelerator Abstraction Interface by delock in https://github.com/microsoft/DeepSpeed/pull/3184
* FP8 unittest for H100 by jomayeri in https://github.com/microsoft/DeepSpeed/pull/3731
* Fix apex install bugs by loadams in https://github.com/microsoft/DeepSpeed/pull/3741
* Fix Autotuner get_gas_from_user_config by straywarrior in https://github.com/microsoft/DeepSpeed/pull/3664
* Include cublas error details when getting cublas handle fails by jli in https://github.com/microsoft/DeepSpeed/pull/3695
* fix hybrid engine mlp module by tensor-tang in https://github.com/microsoft/DeepSpeed/pull/3736
* Fix output transpose dimension bugs by loadams in https://github.com/microsoft/DeepSpeed/pull/3747
* remove UtilsBuilder load, use torch (un)flatten ops by inkcherry in https://github.com/microsoft/DeepSpeed/pull/3728
* add Chinese Zhihu social account by conglongli in https://github.com/microsoft/DeepSpeed/pull/3755
* Account for expert parameters when calculating the total number of pa… by alito in https://github.com/microsoft/DeepSpeed/pull/3720
* fix ccl_backend and residual_add problems by dc3671 in https://github.com/microsoft/DeepSpeed/pull/3642
* Fix url in getting-started guide (docs) by acforvs in https://github.com/microsoft/DeepSpeed/pull/3768
* Update deepspeed-chat/japanese/README.md by eltociear in https://github.com/microsoft/DeepSpeed/pull/3765
* Add H100 workflow and status badge. by loadams in https://github.com/microsoft/DeepSpeed/pull/3754
* Add an api in deepspeed engine for adjusting micro batch size during training by kisseternity in https://github.com/microsoft/DeepSpeed/pull/3773
* Prevent hangs in CI during parallel run compilation by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/2844
* Revert "Prevent hangs in CI during parallel run compilation" by jeffra in https://github.com/microsoft/DeepSpeed/pull/3817
* [Docs] `chrome://tracing` is deprecated by keyboardAnt in https://github.com/microsoft/DeepSpeed/pull/3805
* Support model declaration in zero.Init context by tohtana in https://github.com/microsoft/DeepSpeed/pull/3592
* Update zeropp.md by samadejacobs in https://github.com/microsoft/DeepSpeed/pull/3821
* Reduce Unit Test Times (Part 1) by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/3829
* Re-enable GPT-J unit tests and refactor inference tests by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/3618
* Fix racing condition in GatheredParameters by HeyangQin in https://github.com/microsoft/DeepSpeed/pull/3819
* zero/mics.py: use on_accelerator instead of cuda only by guoyejun in https://github.com/microsoft/DeepSpeed/pull/3806
* Disable AMD test flows in YML by loadams in https://github.com/microsoft/DeepSpeed/pull/3847
* Reduce Unit Test Time (Part 2) by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/3838
* [profiling]add show_straggler argument to log_summary() by delock in https://github.com/microsoft/DeepSpeed/pull/3579
* checking process_group before merging bucket ranges (3521) by clumsy in https://github.com/microsoft/DeepSpeed/pull/3577
* scripts/check-torchcuda.py: add checking for tensor.is_cuda by guoyejun in https://github.com/microsoft/DeepSpeed/pull/3843
* Zero3 Fix allreduce optimization for extra large tensor by hablb in https://github.com/microsoft/DeepSpeed/pull/3832
* [zero] revert PR 3166, it disabled grad clip for bf16 by jeffra in https://github.com/microsoft/DeepSpeed/pull/3790
* Fix transpose convolution FLOPS profiler (retrieval of out_channels) by pinstripe-potoroo in https://github.com/microsoft/DeepSpeed/pull/3834
* Fix LoRA Fuse/Unfuse in Hybrid Engine by sxjscience in https://github.com/microsoft/DeepSpeed/pull/3563
* Update pytorch-lightning version in CI by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/3882
* [Docs] MMEngine has integrated deepspeed. by HAOCHENYE in https://github.com/microsoft/DeepSpeed/pull/3879
* Add FALCON Auto-TP Support by RezaYazdaniAminabadi in https://github.com/microsoft/DeepSpeed/pull/3640
* Update apex installation to resolve apex's pyproject.toml issues. by loadams in https://github.com/microsoft/DeepSpeed/pull/3745
* Extend HE-Lora test with Z3 support + Fix/add guard in HE for Z3 by awan-10 in https://github.com/microsoft/DeepSpeed/pull/3883
* Separate ZeRO3 InflightParamRegistry for train and eval by HeyangQin in https://github.com/microsoft/DeepSpeed/pull/3884
* Add GPTNeoX AutoTP support by Yejing-Lai in https://github.com/microsoft/DeepSpeed/pull/3778
* Fix Meta Tensor checkpoint load for BLOOM models by lekurile in https://github.com/microsoft/DeepSpeed/pull/3885
* fix error :Dictionary expression not allowed in type annotation Pylance by digger-yu in https://github.com/microsoft/DeepSpeed/pull/3708
* Fix rnn flop profiler to compute flops instead of macs by pinstripe-potoroo in https://github.com/microsoft/DeepSpeed/pull/3833
* Update workflows for merge queue by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/3892
* Avoid deprecation warnings in `CHECK_CUDA` by Flamefire in https://github.com/microsoft/DeepSpeed/pull/3854
* Silence comm.py warning by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/3893
* Fix a typo of global variable in comm.py by hipudding in https://github.com/microsoft/DeepSpeed/pull/3852
* [ROCm] Enable TestCUDABackward::test_backward unit tests by rraminen in https://github.com/microsoft/DeepSpeed/pull/3849
* [profiling][mics]Fix some issues for log_summary(). by ys950902 in https://github.com/microsoft/DeepSpeed/pull/3899
* fix "undefined symbol: curandCreateGenerator" for quantizer op by jinzhen-lin in https://github.com/microsoft/DeepSpeed/pull/3846
* fix memory leak with zero-3 by jeffra in https://github.com/microsoft/DeepSpeed/pull/3903
* fix some typo docs/ by digger-yu in https://github.com/microsoft/DeepSpeed/pull/3917
* fix: change ==NONE to is under deepspeed/ by digger-yu in https://github.com/microsoft/DeepSpeed/pull/3923
* Del comment deepspeed.zero.Init() can be used as a decorator by hipudding in https://github.com/microsoft/DeepSpeed/pull/3894
* Remove the param.ds_tensor from print by HeyangQin in https://github.com/microsoft/DeepSpeed/pull/3928
* Reduce Unit Test Times (Part 3) by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/3850
* Update zero_to_fp32.py - to support deepspeed_stage_1 by PicoCreator in https://github.com/microsoft/DeepSpeed/pull/3936
* [docs] add xTrimoPGLM by jeffra in https://github.com/microsoft/DeepSpeed/pull/3940
* Update Nvidia docker base image by KaiChen1008 in https://github.com/microsoft/DeepSpeed/pull/3930
* Fix inference tutorial docs for checkpoints by loadams in https://github.com/microsoft/DeepSpeed/pull/3955
* fix Megatron-DeepSpeed links by conglongli in https://github.com/microsoft/DeepSpeed/pull/3956
* skip bcast when enable pp but pp_group_size=1 by inkcherry in https://github.com/microsoft/DeepSpeed/pull/3915
* Use device_name instead of device index to support other device by hipudding in https://github.com/microsoft/DeepSpeed/pull/3933
* Create accelerator for apple silicon GPU Acceleration by NripeshN in https://github.com/microsoft/DeepSpeed/pull/3907
* fix(cpu_accelerator): :bug: Convert LOCAL_SIZE to integer by javsalgar in https://github.com/microsoft/DeepSpeed/pull/3971

New Contributors
* straywarrior made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3664
* alito made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3720
* acforvs made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3768
* keyboardAnt made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3805
* pinstripe-potoroo made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3834
* HAOCHENYE made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3879
* Yejing-Lai made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3778
* Flamefire made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3854
* hipudding made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3852
* PicoCreator made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3936
* KaiChen1008 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3930
* NripeshN made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3907
* javsalgar made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3971

**Full Changelog**: https://github.com/microsoft/DeepSpeed/compare/v0.9.4...v0.10.0

0.9.5

Not secure
What's Changed
* Documentation for DeepSpeed Accelerator Abstraction Interface by delock in https://github.com/microsoft/DeepSpeed/pull/3184
* FP8 unittest for H100 by jomayeri in https://github.com/microsoft/DeepSpeed/pull/3731
* Fix apex install bugs by loadams in https://github.com/microsoft/DeepSpeed/pull/3741
* Fix Autotuner get_gas_from_user_config by straywarrior in https://github.com/microsoft/DeepSpeed/pull/3664
* Include cublas error details when getting cublas handle fails by jli in https://github.com/microsoft/DeepSpeed/pull/3695
* fix hybrid engine mlp module by tensor-tang in https://github.com/microsoft/DeepSpeed/pull/3736
* Fix output transpose dimension bugs by loadams in https://github.com/microsoft/DeepSpeed/pull/3747
* remove UtilsBuilder load, use torch (un)flatten ops by inkcherry in https://github.com/microsoft/DeepSpeed/pull/3728
* add Chinese Zhihu social account by conglongli in https://github.com/microsoft/DeepSpeed/pull/3755
* Account for expert parameters when calculating the total number of pa… by alito in https://github.com/microsoft/DeepSpeed/pull/3720
* fix ccl_backend and residual_add problems by dc3671 in https://github.com/microsoft/DeepSpeed/pull/3642
* Fix url in getting-started guide (docs) by acforvs in https://github.com/microsoft/DeepSpeed/pull/3768
* Update deepspeed-chat/japanese/README.md by eltociear in https://github.com/microsoft/DeepSpeed/pull/3765
* Add H100 workflow and status badge. by loadams in https://github.com/microsoft/DeepSpeed/pull/3754
* Zero++ tutorial PR by HeyangQin in https://github.com/microsoft/DeepSpeed/pull/3783
* [Fix] _conv_flops_compute when padding is a str and stride=1 by zhiruiluo in https://github.com/microsoft/DeepSpeed/pull/3169
* fix interpolate flops compute by cli99 in https://github.com/microsoft/DeepSpeed/pull/3782
* use `Flops Profiler` to test `model.generate()` by CaffreyR in https://github.com/microsoft/DeepSpeed/pull/2515
* [zero] revert PR 3611 by jeffra in https://github.com/microsoft/DeepSpeed/pull/3786

New Contributors
* straywarrior made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3664
* alito made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3720
* acforvs made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3768
* zhiruiluo made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3169
* CaffreyR made their first contribution in https://github.com/microsoft/DeepSpeed/pull/2515

**Full Changelog**: https://github.com/microsoft/DeepSpeed/compare/v0.9.4...v0.9.5

0.9.4

Not secure
What's Changed
* [MiCS] [Fix] saving and loading model checkpoint logic for MiCS sharding by zarzen in https://github.com/microsoft/DeepSpeed/pull/3440
* fix some typo by digger-yu in https://github.com/microsoft/DeepSpeed/pull/3675
* Use logger in accelerator by tjruwase in https://github.com/microsoft/DeepSpeed/pull/3682
* Update README to add ICS'23 paper on Tensor Parallel MoEs by siddharth9820 in https://github.com/microsoft/DeepSpeed/pull/3687
* non-JIT build fix on ROCm by rraminen in https://github.com/microsoft/DeepSpeed/pull/3638
* Fix local rank mismatch error when training on nodes with different number of GPUs by byungsoo-oh in https://github.com/microsoft/DeepSpeed/pull/3409
* Correct world_size/backend for mpi by abhilash1910 in https://github.com/microsoft/DeepSpeed/pull/3694
* Fix incorrectly formatted f string in hostfile checking by loadams in https://github.com/microsoft/DeepSpeed/pull/3698
* fix typo name of hybrid engine func by tensor-tang in https://github.com/microsoft/DeepSpeed/pull/3689
* Revert "fix typo name (3689)" by loadams in https://github.com/microsoft/DeepSpeed/pull/3702
* Fix gpt-j inference issue by RezaYazdaniAminabadi in https://github.com/microsoft/DeepSpeed/pull/3639
* change partititon_name to partition_name by digger-yu in https://github.com/microsoft/DeepSpeed/pull/3700
* Fix unit test typo in tests/unit/ops/transformer/inference by mrwyattii in https://github.com/microsoft/DeepSpeed/pull/3697
* Small tweak on cuda version mismatch documentation by jli in https://github.com/microsoft/DeepSpeed/pull/3706
* DeepSpeed overview in Japanese by conglongli in https://github.com/microsoft/DeepSpeed/pull/3709
* zero3 performance optimizations by hablb in https://github.com/microsoft/DeepSpeed/pull/3622
* Fix typo in name of hybrid engine function by loadams in https://github.com/microsoft/DeepSpeed/pull/3704
* Increase tensor creator coverage by tjruwase in https://github.com/microsoft/DeepSpeed/pull/3684
* [Bugfix][CPU] Remove C++ version in CPU OpBuilder by delock in https://github.com/microsoft/DeepSpeed/pull/3643
* Single Node is using unreferenced pdsh kill cmd while terminating by abhilash1910 in https://github.com/microsoft/DeepSpeed/pull/3730
* Update Dockerfile with newer cuda and torch. by loadams in https://github.com/microsoft/DeepSpeed/pull/3716

New Contributors
* byungsoo-oh made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3409
* abhilash1910 made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3694
* tensor-tang made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3689
* jli made their first contribution in https://github.com/microsoft/DeepSpeed/pull/3706

**Full Changelog**: https://github.com/microsoft/DeepSpeed/compare/v0.9.3...v0.9.4

Page 6 of 17

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.