Accelerate

Latest version: v1.1.1

Safety actively analyzes 681812 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 16

0.26.0

Support for MS-AMP

This release adds support for the [MS-AMP](https://github.com/Azure/MS-AMP) (Microsoft Automatic Mixed Precision Library) into Accelerate as an alternative backend for doing FP8 training on appropriate hardware. It is the default backend of choice. Read more in the docs [here](https://huggingface.co/docs/accelerate/concept_guides/low_precision_training). Introduced in https://github.com/huggingface/accelerate/pull/2232 by muellerzr

Core

In the prior release a new sampler for the `DataLoader` was introduced that while across seeds does not show statistical differences in the results, repeating the same seed would result in a different end-accuracy that was scary to some users. We have now disabled this behavior by default as it required some additional setup, and brought back the original implementation. To have the new sampling technique (which can provide more accurate repeated results) pass `use_seedable_sampler=True` to the `Accelerator`. We will be propagating this up to the `Trainer` soon.

Big Model Inference

* NPU support was added thanks to statelesshz in https://github.com/huggingface/accelerate/pull/2222
* When generating an automatic `device_map` we've made it possible to not returned grouped key results if desired in https://github.com/huggingface/accelerate/pull/2233
* We now handle corner cases better when users pass `device_map="cuda"` etc thanks to younesbelkada in https://github.com/huggingface/accelerate/pull/2254

FSDP and DeepSpeed

* Many improvements to the docs have been made thanks to stass. Along with this we've made it easier to adjust the config for the sharding strategy and other config values thanks to pacman100 in https://github.com/huggingface/accelerate/pull/2288

* A regression in Accelerate 0.23.0 occurred that showed learning is much slower on multi-GPU setups compared to a single GPU. https://github.com/huggingface/accelerate/pull/2304 has now fixed this thanks to pacman100

* The DeepSpeed integration now also handles `auto` values better when making a configuration in https://github.com/huggingface/accelerate/pull/2313

Bits and Bytes
* `Params4bit` added to bnb classes in set_module_tensor_to_device() by poedator in https://github.com/huggingface/accelerate/pull/2315

Device Agnostic Testing

For developers, we've made it much easier to run the *tests* on different devices with no change to the code thanks to statelesshz in https://github.com/huggingface/accelerate/pull/2123 and https://github.com/huggingface/accelerate/pull/2235

Bug Fixes
* Check notebook launcher for 3090+ by muellerzr in https://github.com/huggingface/accelerate/pull/2212
* Fix dtype bug when `offload_state_dict=True` and `dtype` is specified by fxmarty in https://github.com/huggingface/accelerate/pull/2116
* fix tqdm wrapper to print when process id ==0 by kashif in https://github.com/huggingface/accelerate/pull/2223
* fix BFloat16 is not supported on MPS (2226) by jxysoft in https://github.com/huggingface/accelerate/pull/2227
* Fix MpDeviceLoaderWrapper not having attribute batch_sampler by vanbasten23 in https://github.com/huggingface/accelerate/pull/2242
* [deepspeed] fix setting `auto` values for comm buffers by stas00 in https://github.com/huggingface/accelerate/pull/2295
* Fix infer_auto_device_map when tied weights share the same prefix name by fxmarty in https://github.com/huggingface/accelerate/pull/2324
* Fixes bug in swapping weights when replacing with Transformer-Engine layers by sudhakarsingh27 in https://github.com/huggingface/accelerate/pull/2305
* Fix breakpoint API in test_script.py on TPU. by vanbasten23 in https://github.com/huggingface/accelerate/pull/2263
* Bring old seed technique back by muellerzr in https://github.com/huggingface/accelerate/pull/2319


Major Contributors

* statelesshz for their work on device-agnostic testing and NPU support
* stas00 for many docfixes when it comes to DeepSpeed and FSDP

General Changelog
* add missing whitespace by stas00 in https://github.com/huggingface/accelerate/pull/2206
* MNT Delete the delete doc workflows by BenjaminBossan in https://github.com/huggingface/accelerate/pull/2217
* Update docker images by muellerzr in https://github.com/huggingface/accelerate/pull/2213
* Add allgather check for xpu by abhilash1910 in https://github.com/huggingface/accelerate/pull/2199
* Check notebook launcher for 3090+ by muellerzr in https://github.com/huggingface/accelerate/pull/2212
* Fix dtype bug when `offload_state_dict=True` and `dtype` is specified by fxmarty in https://github.com/huggingface/accelerate/pull/2116
* fix tqdm wrapper to print when process id ==0 by kashif in https://github.com/huggingface/accelerate/pull/2223
* [data_loader] expand the error message by stas00 in https://github.com/huggingface/accelerate/pull/2221
* Update the 'Frameworks using Accelerate' section to include Amphion by RMSnow in https://github.com/huggingface/accelerate/pull/2225
* [Docs] Add doc for cpu/disk offload by SunMarc in https://github.com/huggingface/accelerate/pull/2231
* device agnostic testing by statelesshz in https://github.com/huggingface/accelerate/pull/2123
* Make cleaning optional for device map by muellerzr in https://github.com/huggingface/accelerate/pull/2233
* Add npu support to big model inference by statelesshz in https://github.com/huggingface/accelerate/pull/2222
* fix the DS failing test by pacman100 in https://github.com/huggingface/accelerate/pull/2237
* Fix nb tests by muellerzr in https://github.com/huggingface/accelerate/pull/2230
* fix BFloat16 is not supported on MPS (2226) by jxysoft in https://github.com/huggingface/accelerate/pull/2227
* Fix MpDeviceLoaderWrapper not having attribute batch_sampler by vanbasten23 in https://github.com/huggingface/accelerate/pull/2242
* [`Big-Modeling`] Harmonize device check to handle corner cases by younesbelkada in https://github.com/huggingface/accelerate/pull/2254
* Support `log_images` for aim tracker by Justin900429 in https://github.com/huggingface/accelerate/pull/2257
* Integrate MS-AMP Support for FP8 as a seperate backend by muellerzr in https://github.com/huggingface/accelerate/pull/2232
* refactor deepspeed dataloader prepare logic by pacman100 in https://github.com/huggingface/accelerate/pull/2238
* device agnostic deepspeed&fsdp testing by statelesshz in https://github.com/huggingface/accelerate/pull/2235
* Solve CUDA issues by muellerzr in https://github.com/huggingface/accelerate/pull/2272
* Uninstall DVC in the Trainer tests by muellerzr in https://github.com/huggingface/accelerate/pull/2271
* Rm DVCLive from test reqs as latest version causes failures by muellerzr in https://github.com/huggingface/accelerate/pull/2279
* typo fix by stas00 in https://github.com/huggingface/accelerate/pull/2276
* Add condition before using `check_tied_parameters_on_same_device` by SunMarc in https://github.com/huggingface/accelerate/pull/2218
* [doc] FSDP improvements by stas00 in https://github.com/huggingface/accelerate/pull/2274
* [deepspeed docs] auto-values aren't being covered by stas00 in https://github.com/huggingface/accelerate/pull/2286
* Improve FSDP config usability by pacman100 in https://github.com/huggingface/accelerate/pull/2288
* [doc] language fixes by stas00 in https://github.com/huggingface/accelerate/pull/2292
* Bump tj-actions/changed-files from 22.2 to 41 in /.github/workflows by dependabot in https://github.com/huggingface/accelerate/pull/2300
* add back dvclive to tests by dberenbaum in https://github.com/huggingface/accelerate/pull/2280
* Fixes bug in swapping weights when replacing with Transformer-Engine layers by sudhakarsingh27 in https://github.com/huggingface/accelerate/pull/2305
* Fix breakpoint API in test_script.py on TPU. by vanbasten23 in https://github.com/huggingface/accelerate/pull/2263
* make test_state_checkpointing device agnostic by statelesshz in https://github.com/huggingface/accelerate/pull/2290
* [deepspeed] documentation by stas00 in https://github.com/huggingface/accelerate/pull/2296
* Add more missing items by muellerzr in https://github.com/huggingface/accelerate/pull/2309
* Update docs: Add warning for device_map=None for load_checkpoint_and_dispatch by PhilJd in https://github.com/huggingface/accelerate/pull/2308
* [deepspeed] fix setting `auto` values for comm buffers by stas00 in https://github.com/huggingface/accelerate/pull/2295
* DeepSpeed refactoring by pacman100 in https://github.com/huggingface/accelerate/pull/2313
* Fix DeepSpeed related regression by pacman100 in https://github.com/huggingface/accelerate/pull/2304
* Update test_deepspeed.py by pacman100 in https://github.com/huggingface/accelerate/pull/2323
* Bring old seed technique back by muellerzr in https://github.com/huggingface/accelerate/pull/2319
* Fix batch_size sanity check in `prepare_data_loader` by izhx in https://github.com/huggingface/accelerate/pull/2310
* `Params4bit` added to bnb classes in set_module_tensor_to_device() by poedator in https://github.com/huggingface/accelerate/pull/2315
* Fix infer_auto_device_map when tied weights share the same prefix name by fxmarty in https://github.com/huggingface/accelerate/pull/2324

New Contributors
* fxmarty made their first contribution in https://github.com/huggingface/accelerate/pull/2116
* RMSnow made their first contribution in https://github.com/huggingface/accelerate/pull/2225
* jxysoft made their first contribution in https://github.com/huggingface/accelerate/pull/2227
* vanbasten23 made their first contribution in https://github.com/huggingface/accelerate/pull/2242
* Justin900429 made their first contribution in https://github.com/huggingface/accelerate/pull/2257
* dependabot made their first contribution in https://github.com/huggingface/accelerate/pull/2300
* sudhakarsingh27 made their first contribution in https://github.com/huggingface/accelerate/pull/2305
* PhilJd made their first contribution in https://github.com/huggingface/accelerate/pull/2308
* izhx made their first contribution in https://github.com/huggingface/accelerate/pull/2310
* poedator made their first contribution in https://github.com/huggingface/accelerate/pull/2315

**Full Changelog**: https://github.com/huggingface/accelerate/compare/v0.25.0...v0.26.0

0.25.0

Safetensors default

As of this release, `safetensors` will be the default format saved when applicable! To read more about safetensors and why it's best to use it for safety (and not pickle/torch.save), check it out [here](https://github.com/huggingface/safetensors)

New Experiment Trackers

This release has two new experiment trackers, ClearML and DVCLive!

To use them, just pass `clear_ml` or `dvclive` to `log_with` in the `Accelerator` init. h/t to eugen-ajechiloae-clearml and dberenbaum

DeepSpeed

* Accelerate's DeepSpeed integration now supports NPU devices, h/t to statelesshz
* DeepSpeed can now be launched via accelerate on single GPU setups

FSDP

FSDP had a huge refactoring so that the interface when using FSDP is the exact same as every other scenario when using `accelerate`. No more needing to call `accelerator.prepare()` twice!

Other useful enhancements

* We now raise and try to disable P2P communications on consumer GPUs for the 3090 series and beyond. Without this users were seeing timeout issues and the like as NVIDIA dropped P2P support. If using `accelerate launch` we will automatically disable, and if we sense that it is still enabled on distributed setups using 3090's +, we will raise an error.

* When doing `.gather()`, if tensors are on different devices we explicitly will raise an error (for now only valid on CUDA)

Bug fixes

* Fixed a bug that caused dataloaders to not shuffle despite `shuffle=True` when using multiple GPUs and the new `SeedableRandomSampler`.

General Changelog
* Add logs offloading by SunMarc in https://github.com/huggingface/accelerate/pull/2075
* Add ClearML tracker by eugen-ajechiloae-clearml in https://github.com/huggingface/accelerate/pull/2034
* CRITICAL: fix failing ci by muellerzr in https://github.com/huggingface/accelerate/pull/2088
* Fix flag typo by kuza55 in https://github.com/huggingface/accelerate/pull/2090
* Fix batch sampler by muellerzr in https://github.com/huggingface/accelerate/pull/2097
* fixed ip address typo by Fluder-Paradyne in https://github.com/huggingface/accelerate/pull/2099
* Fix memory leak in fp8 causing OOM (and potentially 3x vRAM usage) by muellerzr in https://github.com/huggingface/accelerate/pull/2089
* fix warning when offload by SunMarc in https://github.com/huggingface/accelerate/pull/2105
* Always use SeedableRandomSampler by muellerzr in https://github.com/huggingface/accelerate/pull/2110
* Fix issue with tests by muellerzr in https://github.com/huggingface/accelerate/pull/2111
* Make SeedableRandomSampler the default always by muellerzr in https://github.com/huggingface/accelerate/pull/2117
* Use "and" instead of comma in Bibtex citation by qgallouedec in https://github.com/huggingface/accelerate/pull/2119
* Add explicit error if empty batch received by YuryYakhno in https://github.com/huggingface/accelerate/pull/2115
* Allow for ACCELERATE_SEED env var by muellerzr in https://github.com/huggingface/accelerate/pull/2126
* add DeepSpeed support for NPU by statelesshz in https://github.com/huggingface/accelerate/pull/2054
* Sync states for npu fsdp by jq460494839 in https://github.com/huggingface/accelerate/pull/2113
* Fix import error when torch>=2.0.1 and torch.distributed is disabled by natsukium in https://github.com/huggingface/accelerate/pull/2121
* Make safetensors the default by muellerzr in https://github.com/huggingface/accelerate/pull/2120
* Raise error when saving with param on meta device by SunMarc in https://github.com/huggingface/accelerate/pull/2132
* Leave native `save` as `False` by muellerzr in https://github.com/huggingface/accelerate/pull/2138
* fix retie_parameters by SunMarc in https://github.com/huggingface/accelerate/pull/2137
* Deal with shared memory scenarios by muellerzr in https://github.com/huggingface/accelerate/pull/2136
* specify config file path on README by kwonmha in https://github.com/huggingface/accelerate/pull/2140
* Fix safetensors contiguous by SunMarc in https://github.com/huggingface/accelerate/pull/2145
* Fix more tests by muellerzr in https://github.com/huggingface/accelerate/pull/2146
* [docs] fixed a couple of broken links by MKhalusova in https://github.com/huggingface/accelerate/pull/2147
* [docs] troubleshooting guide by MKhalusova in https://github.com/huggingface/accelerate/pull/2133
* [Docs] fix doc typos by kashif in https://github.com/huggingface/accelerate/pull/2150
* Add note about GradientState being in-sync with the dataloader by default by muellerzr in https://github.com/huggingface/accelerate/pull/2134
* Deprecated runner stuff by muellerzr in https://github.com/huggingface/accelerate/pull/2152
* Add examples to tests by muellerzr in https://github.com/huggingface/accelerate/pull/2131
* Disable pypi for merge workflows + fix trainer tests by muellerzr in https://github.com/huggingface/accelerate/pull/2153
* Adds dvclive tracker by dberenbaum in https://github.com/huggingface/accelerate/pull/2139
* check port availability only in main deepspeed/torchrun launcher by Jingru in https://github.com/huggingface/accelerate/pull/2078
* Do not attempt to pad nested tensors by frankier in https://github.com/huggingface/accelerate/pull/2041
* Add warning for problematic libraries by muellerzr in https://github.com/huggingface/accelerate/pull/2151
* Add ZeRO++ to DeepSpeed usage docs by SumanthRH in https://github.com/huggingface/accelerate/pull/2166
* Fix Megatron-LM Arguments Bug by yuanenming in https://github.com/huggingface/accelerate/pull/2168
* Fix non persistant buffer dispatch by SunMarc in https://github.com/huggingface/accelerate/pull/1941
* Updated torchrun instructions by TJ-Solergibert in https://github.com/huggingface/accelerate/pull/2096
* New CI Runners by muellerzr in https://github.com/huggingface/accelerate/pull/2087
* Revert "New CI Runners" by muellerzr in https://github.com/huggingface/accelerate/pull/2172
* [Working again] New CI by muellerzr in https://github.com/huggingface/accelerate/pull/2173
* fsdp refactoring by pacman100 in https://github.com/huggingface/accelerate/pull/2177
* Pin DVC by muellerzr in https://github.com/huggingface/accelerate/pull/2196
* Apply DVC warning to Accelerate by muellerzr in https://github.com/huggingface/accelerate/pull/2197
* Explicitly disable P2P using `launch`, and pick up in `state` if a user will face issues. by muellerzr in https://github.com/huggingface/accelerate/pull/2195
* Better error when device mismatches when calling gather() on CUDA by muellerzr in https://github.com/huggingface/accelerate/pull/2180
* unpins dvc by dberenbaum in https://github.com/huggingface/accelerate/pull/2200
* Assemble state dictionary for offloaded models by blbadger in https://github.com/huggingface/accelerate/pull/2156
* Allow deepspeed without distributed launcher by pacman100 in https://github.com/huggingface/accelerate/pull/2204

New Contributors
* eugen-ajechiloae-clearml made their first contribution in https://github.com/huggingface/accelerate/pull/2034
* kuza55 made their first contribution in https://github.com/huggingface/accelerate/pull/2090
* Fluder-Paradyne made their first contribution in https://github.com/huggingface/accelerate/pull/2099
* YuryYakhno made their first contribution in https://github.com/huggingface/accelerate/pull/2115
* jq460494839 made their first contribution in https://github.com/huggingface/accelerate/pull/2113
* kwonmha made their first contribution in https://github.com/huggingface/accelerate/pull/2140
* dberenbaum made their first contribution in https://github.com/huggingface/accelerate/pull/2139
* Jingru made their first contribution in https://github.com/huggingface/accelerate/pull/2078
* frankier made their first contribution in https://github.com/huggingface/accelerate/pull/2041
* yuanenming made their first contribution in https://github.com/huggingface/accelerate/pull/2168
* TJ-Solergibert made their first contribution in https://github.com/huggingface/accelerate/pull/2096
* blbadger made their first contribution in https://github.com/huggingface/accelerate/pull/2156

**Full Changelog**: https://github.com/huggingface/accelerate/compare/v0.24.1...v0.25.0

0.24.1

- Fixes https://github.com/huggingface/accelerate/issues/2091 by changing how checking for custom samplers is done

0.24.0

Improved Reproducibility

One critical issue with Accelerate is training runs were different when using an iterable dataset, no matter what seeds were set. v0.24.0 introduces the `dataloader.set_epoch()` function to all `Accelerate` `DataLoaders`, where if the underlying dataset (or sampler) has the ability to set the epoch for reproducability it will do so. This is similar to the implementation already existing in transformers. To use:

python
dataloader = accelerator.prepare(dataloader)
Say we want to resume at epoch/iteration 2
dataloader.set_epoch(2)


For more information see this [PR](https://github.com/huggingface/accelerate/pull/2057), we will update the docs on a subsequent release with more information on this API.

Documentation

* The quick tour docs have gotten a complete makeover thanks to MKhalusova. Take a look [here](https://hf.co/docs/accelerate/quicktour)
* We also now have documentation on how to perform multinode training, see the [launch docs](https://hf.co/docs/accelerate/basic_tutorials/launch)

Internal structure
* Shared file systems are now supported under `save` and `save_state` via the `ProjectConfiguration` dataclass. See 1953 for more info.
* FSDP can now be used for `bfloat16` mixed precision via `torch.autocast`
* `all_gather_into_tensor` is now used as the main gather operation, reducing memory in the cases of big tensors
* Specifying `drop_last=True` will now properly have the desired affect when performing `Accelerator().gather_for_metrics()`


What's Changed
* Update big_modeling.md by kli-casia in https://github.com/huggingface/accelerate/pull/1976
* Fix model copy after `dispatch_model` by austinapatel in https://github.com/huggingface/accelerate/pull/1971
* FIX: Automatic checkpoint path inference issue by BenjaminBossan in https://github.com/huggingface/accelerate/pull/1989
* Fix skip first batch for deepspeed example by SumanthRH in https://github.com/huggingface/accelerate/pull/2001
* [docs] Quick tour refactor by MKhalusova in https://github.com/huggingface/accelerate/pull/2008
* Add basic documentation for multi node training by SumanthRH in https://github.com/huggingface/accelerate/pull/1988
* update torch_dynamo backends by SunMarc in https://github.com/huggingface/accelerate/pull/1992
* Sync states for xpu fsdp by abhilash1910 in https://github.com/huggingface/accelerate/pull/2005
* update fsdp docs by pacman100 in https://github.com/huggingface/accelerate/pull/2026
* Enable shared file system with `save` and `save_state` via ProjectConfiguration by muellerzr in https://github.com/huggingface/accelerate/pull/1953
* Fix save on each node by muellerzr in https://github.com/huggingface/accelerate/pull/2036
* Allow FSDP to use with `torch.autocast` for bfloat16 mixed precision by brcps12 in https://github.com/huggingface/accelerate/pull/2033
* Fix DeepSpeed version to <0.11 by BenjaminBossan in https://github.com/huggingface/accelerate/pull/2043
* Unpin deepspeed by muellerzr in https://github.com/huggingface/accelerate/pull/2044
* Reduce memory by using `all_gather_into_tensor` by muellerzr in https://github.com/huggingface/accelerate/pull/1968
* Safely end training even if trackers weren't initialized by Ben-Epstein in https://github.com/huggingface/accelerate/pull/1994
* Fix integration CI by muellerzr in https://github.com/huggingface/accelerate/pull/2047
* Make fsdp ram efficient loading optional by pacman100 in https://github.com/huggingface/accelerate/pull/2037
* Let drop_last modify `gather_for_metrics` by muellerzr in https://github.com/huggingface/accelerate/pull/2048
* fix docstring by zhangsibo1129 in https://github.com/huggingface/accelerate/pull/2053
* Fix stalebot by muellerzr in https://github.com/huggingface/accelerate/pull/2052
* Add space to docs by muellerzr in https://github.com/huggingface/accelerate/pull/2055
* Fix the error when the "train_batch_size" is absent in DeepSpeed config by LZHgrla in https://github.com/huggingface/accelerate/pull/2060
* remove unused constants by statelesshz in https://github.com/huggingface/accelerate/pull/2045
* fix: remove useless token by rtrompier in https://github.com/huggingface/accelerate/pull/2069
* DOC: Fix broken link to designing a device map by BenjaminBossan in https://github.com/huggingface/accelerate/pull/2073
* Let iterable dataset shard have a length if implemented by muellerzr in https://github.com/huggingface/accelerate/pull/2066
* Allow for samplers to be seedable and reproducable by muellerzr in https://github.com/huggingface/accelerate/pull/2057
* Fix docstring typo by qgallouedec in https://github.com/huggingface/accelerate/pull/2072
* Warn when kernel version is too low on Linux by BenjaminBossan in https://github.com/huggingface/accelerate/pull/2077

New Contributors
* kli-casia made their first contribution in https://github.com/huggingface/accelerate/pull/1976
* MKhalusova made their first contribution in https://github.com/huggingface/accelerate/pull/2008
* brcps12 made their first contribution in https://github.com/huggingface/accelerate/pull/2033
* Ben-Epstein made their first contribution in https://github.com/huggingface/accelerate/pull/1994
* zhangsibo1129 made their first contribution in https://github.com/huggingface/accelerate/pull/2053
* LZHgrla made their first contribution in https://github.com/huggingface/accelerate/pull/2060
* rtrompier made their first contribution in https://github.com/huggingface/accelerate/pull/2069
* qgallouedec made their first contribution in https://github.com/huggingface/accelerate/pull/2072

**Full Changelog**: https://github.com/huggingface/accelerate/compare/v0.23.0...v0.24.0

0.23.0

Model Memory Estimator

A new model estimation tool to help calculate how much memory is needed for inference has been added. This does not download the pretrained weights, and utilizes `init_empty_weights` to stay memory efficient during the calculation.

Usage directions:

bash
accelerate estimate-memory {model_name} --library {library_name} --dtypes fp16 int8

Or:
python
from accelerate.commands.estimate import estimate_command_parser, estimate_command, gather_data

parser = estimate_command_parser()
args = parser.parse_args(["bert-base-cased", "--dtypes", "float32"])
output = gather_data(args)


🤗 Hub is a first-class citizen

We've made the `huggingface_hub` library a first-class citizen of the framework! While this is mainly for the model estimation tool, this opens the doors for further integrations should they be wanted

`Accelerator` Enhancements:

- `gather_for_metrics` will now also de-dupe for non-tensor objects. See 1937
- `mixed_precision="bf16"` support on NPU devices. See 1949
- New `breakpoint` API to help when dealing with trying to break from a condition on a single process. See 1940
-

Notebook Launcher Enhancements:

- The notebook launcher now supports launching across multiple nodes! See 1913

FSDP Enhancements:

- Activation checkpointing is now natively supported in the framework. See https://github.com/huggingface/accelerate/pull/1891
- `torch.compile` support was fixed. See 1919

DeepSpeed Enhancements:

- XPU/ccl support (1827)
- Easier gradient accumulation support, simply set `gradient_accumulation_steps` to `"auto"` in your deepspeed config, and Accelerate will use the one passed to `Accelerator` instead (1901)
- Support for custom schedulers and deepspeed optimizers (1909)

What's Changed
* Update release instructions by sgugger in https://github.com/huggingface/accelerate/pull/1877
* fix detach_hook by SunMarc in https://github.com/huggingface/accelerate/pull/1880
* Enable power users to bypass device_map="auto" training block by muellerzr in https://github.com/huggingface/accelerate/pull/1881
* Introduce model memory estimator by muellerzr in https://github.com/huggingface/accelerate/pull/1876
* Update with new url for explore by muellerzr in https://github.com/huggingface/accelerate/pull/1884
* Enable a token to be used by muellerzr in https://github.com/huggingface/accelerate/pull/1886
* Add doc on model memory usage by muellerzr in https://github.com/huggingface/accelerate/pull/1887
* Add hub as core dep by muellerzr in https://github.com/huggingface/accelerate/pull/1885
* update import of deepspeed integration from transformers by pacman100 in https://github.com/huggingface/accelerate/pull/1894
* Final nits on model util by muellerzr in https://github.com/huggingface/accelerate/pull/1896
* Fix nb launcher test by muellerzr in https://github.com/huggingface/accelerate/pull/1899
* Add FSDP activation checkpointing feature by arde171 in https://github.com/huggingface/accelerate/pull/1891
* Solve at least one failing test by muellerzr in https://github.com/huggingface/accelerate/pull/1898
* Deepspeed integration for XPU/ccl by abhilash1910 in https://github.com/huggingface/accelerate/pull/1827
* Add PR template by muellerzr in https://github.com/huggingface/accelerate/pull/1906
* deepspeed grad_acc_steps fixes by pacman100 in https://github.com/huggingface/accelerate/pull/1901
* Skip pypi transformers until release by muellerzr in https://github.com/huggingface/accelerate/pull/1911
* Fix docker images by muellerzr in https://github.com/huggingface/accelerate/pull/1910
* Use hosted CI runners for building docker images by muellerzr in https://github.com/huggingface/accelerate/pull/1915
* fix: add debug argument to sagemaker configuration by maximegmd in https://github.com/huggingface/accelerate/pull/1904
* improve help info when run `accelerate config` on npu by statelesshz in https://github.com/huggingface/accelerate/pull/1895
* support logging with mlflow in case of mlflow-skinny installed by ghtaro in https://github.com/huggingface/accelerate/pull/1874
* More CI fun - run all test parts always by muellerzr in https://github.com/huggingface/accelerate/pull/1916
* Expose auto in dataclass by muellerzr in https://github.com/huggingface/accelerate/pull/1914
* Add support for deepspeed optimizer and custom scheduler by pacman100 in https://github.com/huggingface/accelerate/pull/1909
* reduce gradient first for XLA when unscaling the gradients in mixed precision training with AMP. by statelesshz in https://github.com/huggingface/accelerate/pull/1926
* Check for invalid keys by muellerzr in https://github.com/huggingface/accelerate/pull/1935
* clean num devices by SunMarc in https://github.com/huggingface/accelerate/pull/1936
* Bring back pypi to runners by muellerzr in https://github.com/huggingface/accelerate/pull/1939
* Support multi-node notebook launching by ggaaooppeenngg in https://github.com/huggingface/accelerate/pull/1913
* fix the fsdp docs by pacman100 in https://github.com/huggingface/accelerate/pull/1947
* Fix docs by ggaaooppeenngg in https://github.com/huggingface/accelerate/pull/1951
* Protect tensorflow dependency by SunMarc in https://github.com/huggingface/accelerate/pull/1959
* fix safetensor saving by SunMarc in https://github.com/huggingface/accelerate/pull/1954
* FIX: patch_environment restores pre-existing environment variables when finished by BenjaminBossan in https://github.com/huggingface/accelerate/pull/1960
* Better guards for slow imports by muellerzr in https://github.com/huggingface/accelerate/pull/1963
* [`Tests`] Finish all todos by younesbelkada in https://github.com/huggingface/accelerate/pull/1957
* Rm strtobool by muellerzr in https://github.com/huggingface/accelerate/pull/1964
* Implementing gather_for_metrics with dedup for non tensor objects by Lorenzobattistela in https://github.com/huggingface/accelerate/pull/1937
* add bf16 mixed precision support for NPU by statelesshz in https://github.com/huggingface/accelerate/pull/1949
* Introduce breakpoint API by muellerzr in https://github.com/huggingface/accelerate/pull/1940
* fix torch compile with FSDP by pacman100 in https://github.com/huggingface/accelerate/pull/1919
* Add `force_hooks` to `dispatch_model` by austinapatel in https://github.com/huggingface/accelerate/pull/1969
* update FSDP and DeepSpeed docs by pacman100 in https://github.com/huggingface/accelerate/pull/1973
* Flex fix patch for accelerate by abhilash1910 in https://github.com/huggingface/accelerate/pull/1972
* Remove checkpoints only on main process by Kepnu4 in https://github.com/huggingface/accelerate/pull/1974

New Contributors
* arde171 made their first contribution in https://github.com/huggingface/accelerate/pull/1891
* maximegmd made their first contribution in https://github.com/huggingface/accelerate/pull/1904
* ghtaro made their first contribution in https://github.com/huggingface/accelerate/pull/1874
* ggaaooppeenngg made their first contribution in https://github.com/huggingface/accelerate/pull/1913
* Lorenzobattistela made their first contribution in https://github.com/huggingface/accelerate/pull/1937
* austinapatel made their first contribution in https://github.com/huggingface/accelerate/pull/1969
* Kepnu4 made their first contribution in https://github.com/huggingface/accelerate/pull/1974

**Full Changelog**: https://github.com/huggingface/accelerate/compare/v0.22.0...v0.23.0

0.22.0

Experimental distributed operations checking framework

A new framework has been introduced which can help catch `timeout` errors caused by distributed operations failing *before* they occur. As this adds a tiny bit of overhead, it is an opt-in scenario. Simply run your code with `ACCELERATE_DEBUG_MODE="1"` to enable this. Read more in the [docs](https://huggingface.co/docs/accelerate/main/en/usage_guides/debug), introduced via https://github.com/huggingface/accelerate/pull/1756

`Accelerator.load_state` can now load the most recent checkpoint automatically

If a `ProjectConfiguration` has been made, using `accelerator.load_state()` (without any arguments passed) can now automatically find and load the latest checkpoint used, introduced via https://github.com/huggingface/accelerate/pull/1741

Multiple enhancements to gradient accumulation

In this release multiple new enhancements to distributed gradient accumulation have been added.

* `accelerator.accumulate()` now supports passing in multiple models introduced via https://github.com/huggingface/accelerate/pull/1708
* A util has been introduced to perform multiple forwards, then multiple backwards, and finally sync the gradients only on the last `.backward()` via https://github.com/huggingface/accelerate/pull/1726

FSDP Changes

* FSDP support has been added for NPU and XPU devices via https://github.com/huggingface/accelerate/pull/1803 and https://github.com/huggingface/accelerate/pull/1806
* A new method for supporting RAM-efficient loading of models with FSDP has been added via https://github.com/huggingface/accelerate/pull/1777

DataLoader Changes

* Custom slice functions are now supported in the `DataLoaderDispatcher` added via https://github.com/huggingface/accelerate/pull/1846


What's New?
* fix failing test on 8GPU by statelesshz in https://github.com/huggingface/accelerate/pull/1724
* Better control over DDP's `no_sync` by NouamaneTazi in https://github.com/huggingface/accelerate/pull/1726
* Get rid of calling `get_scale()` by patching the step method of optimizer. by yuxinyuan in https://github.com/huggingface/accelerate/pull/1720
* fix the bug in npu by statelesshz in https://github.com/huggingface/accelerate/pull/1728
* Adding a shape check for `set_module_tensor_to_device`. by Narsil in https://github.com/huggingface/accelerate/pull/1731
* Fix errors when optimizer is not a Pytorch optimizer. by yuxinyuan in https://github.com/huggingface/accelerate/pull/1733
* Make balanced memory able to work with non contiguous GPUs ids by thomwolf in https://github.com/huggingface/accelerate/pull/1734
* Fixed typo in `__repr__` of AlignDevicesHook by KacperWyrwal in https://github.com/huggingface/accelerate/pull/1735
* Update docs by muellerzr in https://github.com/huggingface/accelerate/pull/1736
* Fixed the bug that split dict incorrectly by yuangpeng in https://github.com/huggingface/accelerate/pull/1742
* Let load_state automatically grab the latest save by muellerzr in https://github.com/huggingface/accelerate/pull/1741
* fix `KwargsHandler.to_kwargs` not working with `os.environ` initialization in `__post_init__` by CyCle1024 in https://github.com/huggingface/accelerate/pull/1738
* fix typo by cauyxy in https://github.com/huggingface/accelerate/pull/1747
* Check for misconfiguration of single node & single GPU by muellerzr in https://github.com/huggingface/accelerate/pull/1746
* Remove unused constant by muellerzr in https://github.com/huggingface/accelerate/pull/1749
* Rework new constant for operations by muellerzr in https://github.com/huggingface/accelerate/pull/1748
* Expose `autocast` kwargs and simplify `autocast` wrapper by muellerzr in https://github.com/huggingface/accelerate/pull/1740
* Fix FSDP related issues by pacman100 in https://github.com/huggingface/accelerate/pull/1745
* FSDP enhancements and fixes by pacman100 in https://github.com/huggingface/accelerate/pull/1753
* Fix check failure in `Accelerator.save_state` using multi-gpu by CyCle1024 in https://github.com/huggingface/accelerate/pull/1760
* Fix error when `max_memory` argument is in unexpected order by ranchlai in https://github.com/huggingface/accelerate/pull/1759
* Fix offload on disk when executing on CPU by sgugger in https://github.com/huggingface/accelerate/pull/1762
* Change `is_aim_available()` function to not match aim >= 4.0.0 by alberttorosyan in https://github.com/huggingface/accelerate/pull/1769
* Introduce an experimental distributed operations framework by muellerzr in https://github.com/huggingface/accelerate/pull/1756
* Support wrapping multiple models in Accelerator.accumulate() by yuxinyuan in https://github.com/huggingface/accelerate/pull/1708
* Contigous on gather by muellerzr in https://github.com/huggingface/accelerate/pull/1771
* [FSDP] Fix `load_fsdp_optimizer` by awgu in https://github.com/huggingface/accelerate/pull/1755
* simplify and correct the deepspeed example by pacman100 in https://github.com/huggingface/accelerate/pull/1775
* Set ipex default in state by muellerzr in https://github.com/huggingface/accelerate/pull/1776
* Fix import error when torch>=2.0.1 and `torch.distributed` is disabled by natsukium in https://github.com/huggingface/accelerate/pull/1800
* reserve 10% GPU in `get_balanced_memory` to avoid OOM by ranchlai in https://github.com/huggingface/accelerate/pull/1798
* add support of float memory size in `convert_file_size_to_int` by ranchlai in https://github.com/huggingface/accelerate/pull/1799
* Allow users to resume from previous wandb runs with `allow_val_change` by SumanthRH in https://github.com/huggingface/accelerate/pull/1796
* Add FSDP for XPU by abhilash1910 in https://github.com/huggingface/accelerate/pull/1803
* Add FSDP for NPU by statelesshz in https://github.com/huggingface/accelerate/pull/1806
* Fix pytest import by muellerzr in https://github.com/huggingface/accelerate/pull/1808
* More specific logging in `gather_for_metrics` by dleve123 in https://github.com/huggingface/accelerate/pull/1784
* Detect device map auto and raise a helpful error when trying to not use model parallelism by muellerzr in https://github.com/huggingface/accelerate/pull/1810
* Typo fix by muellerzr in https://github.com/huggingface/accelerate/pull/1812
* Expand device-map warning by muellerzr in https://github.com/huggingface/accelerate/pull/1819
* Update bibtex to reflect team growth by muellerzr in https://github.com/huggingface/accelerate/pull/1820
* Improve docs on grad accumulation by vwxyzjn in https://github.com/huggingface/accelerate/pull/1817
* add warning when using to and cuda by SunMarc in https://github.com/huggingface/accelerate/pull/1790
* Fix bnb import by muellerzr in https://github.com/huggingface/accelerate/pull/1813
* Update docs and docstrings to match `load_and_quantize_model` arg by JonathanRayner in https://github.com/huggingface/accelerate/pull/1822
* Expose a bit of args/docstring fixup by muellerzr in https://github.com/huggingface/accelerate/pull/1824
* Better test by muellerzr in https://github.com/huggingface/accelerate/pull/1825
* Minor idiomatic change for fp8 check. by float-trip in https://github.com/huggingface/accelerate/pull/1829
* Use device as context manager for `init_on_device` by shingjan in https://github.com/huggingface/accelerate/pull/1826
* Ipex bug fix for device properties in modelling by abhilash1910 in https://github.com/huggingface/accelerate/pull/1834
* FIX: Bug with `unwrap_model` and `keep_fp32_wrapper=False` by BenjaminBossan in https://github.com/huggingface/accelerate/pull/1838
* Fix `verify_device_map` by Rexhaif in https://github.com/huggingface/accelerate/pull/1842
* Change CUDA check by muellerzr in https://github.com/huggingface/accelerate/pull/1833
* Fix the noneffective parameter: `gpu_ids` (Rel. Issue 1848) by devymex in https://github.com/huggingface/accelerate/pull/1850
* support for ram efficient loading of model with FSDP by pacman100 in https://github.com/huggingface/accelerate/pull/1777
* Loading logic safetensors by SunMarc in https://github.com/huggingface/accelerate/pull/1853
* fix dispatch for quantized model by SunMarc in https://github.com/huggingface/accelerate/pull/1855
* Update `fsdp_with_peak_mem_tracking`.py by pacman100 in https://github.com/huggingface/accelerate/pull/1856
* Add env variable for `init_on_device` by shingjan in https://github.com/huggingface/accelerate/pull/1852
* remove casting to FP32 when saving state dict by pacman100 in https://github.com/huggingface/accelerate/pull/1868
* support custom slice function in `DataLoaderDispatcher` by thevasudevgupta in https://github.com/huggingface/accelerate/pull/1846
* Include a note to the forums in the bug report by muellerzr in https://github.com/huggingface/accelerate/pull/1871

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* yuxinyuan
* Support wrapping multiple models in `Accelerator.accumulate()` (1708)
* Fix errors when optimizer is not a Pytorch optimizer. (1733)
* Get rid of calling get_scale() by patching the step method of optimizer. (1720)
* NouamaneTazi
* Better control over DDP's `no_sync` (1726)
* abhilash1910
* Add FSDP for XPU (1803)
* Ipex bug fix for device properties in modelling (1834)
* statelesshz
* Add FSDP for NPU (1806)
* fix failing test on 8GPU (1724)
* fix the bug in npu (1728)
* thevasudevgupta
* support custom slice function in `DataLoaderDispatcher` (1846)

**Full Changelog**: https://github.com/huggingface/accelerate/compare/v0.21.0...v0.22.0

Page 4 of 16

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.