Accelerate

Latest version: v1.5.2

Safety actively analyzes 722032 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 6 of 18

0.22.0

Experimental distributed operations checking framework

A new framework has been introduced which can help catch `timeout` errors caused by distributed operations failing *before* they occur. As this adds a tiny bit of overhead, it is an opt-in scenario. Simply run your code with `ACCELERATE_DEBUG_MODE="1"` to enable this. Read more in the [docs](https://huggingface.co/docs/accelerate/main/en/usage_guides/debug), introduced via https://github.com/huggingface/accelerate/pull/1756

`Accelerator.load_state` can now load the most recent checkpoint automatically

If a `ProjectConfiguration` has been made, using `accelerator.load_state()` (without any arguments passed) can now automatically find and load the latest checkpoint used, introduced via https://github.com/huggingface/accelerate/pull/1741

Multiple enhancements to gradient accumulation

In this release multiple new enhancements to distributed gradient accumulation have been added.

* `accelerator.accumulate()` now supports passing in multiple models introduced via https://github.com/huggingface/accelerate/pull/1708
* A util has been introduced to perform multiple forwards, then multiple backwards, and finally sync the gradients only on the last `.backward()` via https://github.com/huggingface/accelerate/pull/1726

FSDP Changes

* FSDP support has been added for NPU and XPU devices via https://github.com/huggingface/accelerate/pull/1803 and https://github.com/huggingface/accelerate/pull/1806
* A new method for supporting RAM-efficient loading of models with FSDP has been added via https://github.com/huggingface/accelerate/pull/1777

DataLoader Changes

* Custom slice functions are now supported in the `DataLoaderDispatcher` added via https://github.com/huggingface/accelerate/pull/1846


What's New?
* fix failing test on 8GPU by statelesshz in https://github.com/huggingface/accelerate/pull/1724
* Better control over DDP's `no_sync` by NouamaneTazi in https://github.com/huggingface/accelerate/pull/1726
* Get rid of calling `get_scale()` by patching the step method of optimizer. by yuxinyuan in https://github.com/huggingface/accelerate/pull/1720
* fix the bug in npu by statelesshz in https://github.com/huggingface/accelerate/pull/1728
* Adding a shape check for `set_module_tensor_to_device`. by Narsil in https://github.com/huggingface/accelerate/pull/1731
* Fix errors when optimizer is not a Pytorch optimizer. by yuxinyuan in https://github.com/huggingface/accelerate/pull/1733
* Make balanced memory able to work with non contiguous GPUs ids by thomwolf in https://github.com/huggingface/accelerate/pull/1734
* Fixed typo in `__repr__` of AlignDevicesHook by KacperWyrwal in https://github.com/huggingface/accelerate/pull/1735
* Update docs by muellerzr in https://github.com/huggingface/accelerate/pull/1736
* Fixed the bug that split dict incorrectly by yuangpeng in https://github.com/huggingface/accelerate/pull/1742
* Let load_state automatically grab the latest save by muellerzr in https://github.com/huggingface/accelerate/pull/1741
* fix `KwargsHandler.to_kwargs` not working with `os.environ` initialization in `__post_init__` by CyCle1024 in https://github.com/huggingface/accelerate/pull/1738
* fix typo by cauyxy in https://github.com/huggingface/accelerate/pull/1747
* Check for misconfiguration of single node & single GPU by muellerzr in https://github.com/huggingface/accelerate/pull/1746
* Remove unused constant by muellerzr in https://github.com/huggingface/accelerate/pull/1749
* Rework new constant for operations by muellerzr in https://github.com/huggingface/accelerate/pull/1748
* Expose `autocast` kwargs and simplify `autocast` wrapper by muellerzr in https://github.com/huggingface/accelerate/pull/1740
* Fix FSDP related issues by pacman100 in https://github.com/huggingface/accelerate/pull/1745
* FSDP enhancements and fixes by pacman100 in https://github.com/huggingface/accelerate/pull/1753
* Fix check failure in `Accelerator.save_state` using multi-gpu by CyCle1024 in https://github.com/huggingface/accelerate/pull/1760
* Fix error when `max_memory` argument is in unexpected order by ranchlai in https://github.com/huggingface/accelerate/pull/1759
* Fix offload on disk when executing on CPU by sgugger in https://github.com/huggingface/accelerate/pull/1762
* Change `is_aim_available()` function to not match aim >= 4.0.0 by alberttorosyan in https://github.com/huggingface/accelerate/pull/1769
* Introduce an experimental distributed operations framework by muellerzr in https://github.com/huggingface/accelerate/pull/1756
* Support wrapping multiple models in Accelerator.accumulate() by yuxinyuan in https://github.com/huggingface/accelerate/pull/1708
* Contigous on gather by muellerzr in https://github.com/huggingface/accelerate/pull/1771
* [FSDP] Fix `load_fsdp_optimizer` by awgu in https://github.com/huggingface/accelerate/pull/1755
* simplify and correct the deepspeed example by pacman100 in https://github.com/huggingface/accelerate/pull/1775
* Set ipex default in state by muellerzr in https://github.com/huggingface/accelerate/pull/1776
* Fix import error when torch>=2.0.1 and `torch.distributed` is disabled by natsukium in https://github.com/huggingface/accelerate/pull/1800
* reserve 10% GPU in `get_balanced_memory` to avoid OOM by ranchlai in https://github.com/huggingface/accelerate/pull/1798
* add support of float memory size in `convert_file_size_to_int` by ranchlai in https://github.com/huggingface/accelerate/pull/1799
* Allow users to resume from previous wandb runs with `allow_val_change` by SumanthRH in https://github.com/huggingface/accelerate/pull/1796
* Add FSDP for XPU by abhilash1910 in https://github.com/huggingface/accelerate/pull/1803
* Add FSDP for NPU by statelesshz in https://github.com/huggingface/accelerate/pull/1806
* Fix pytest import by muellerzr in https://github.com/huggingface/accelerate/pull/1808
* More specific logging in `gather_for_metrics` by dleve123 in https://github.com/huggingface/accelerate/pull/1784
* Detect device map auto and raise a helpful error when trying to not use model parallelism by muellerzr in https://github.com/huggingface/accelerate/pull/1810
* Typo fix by muellerzr in https://github.com/huggingface/accelerate/pull/1812
* Expand device-map warning by muellerzr in https://github.com/huggingface/accelerate/pull/1819
* Update bibtex to reflect team growth by muellerzr in https://github.com/huggingface/accelerate/pull/1820
* Improve docs on grad accumulation by vwxyzjn in https://github.com/huggingface/accelerate/pull/1817
* add warning when using to and cuda by SunMarc in https://github.com/huggingface/accelerate/pull/1790
* Fix bnb import by muellerzr in https://github.com/huggingface/accelerate/pull/1813
* Update docs and docstrings to match `load_and_quantize_model` arg by JonathanRayner in https://github.com/huggingface/accelerate/pull/1822
* Expose a bit of args/docstring fixup by muellerzr in https://github.com/huggingface/accelerate/pull/1824
* Better test by muellerzr in https://github.com/huggingface/accelerate/pull/1825
* Minor idiomatic change for fp8 check. by float-trip in https://github.com/huggingface/accelerate/pull/1829
* Use device as context manager for `init_on_device` by shingjan in https://github.com/huggingface/accelerate/pull/1826
* Ipex bug fix for device properties in modelling by abhilash1910 in https://github.com/huggingface/accelerate/pull/1834
* FIX: Bug with `unwrap_model` and `keep_fp32_wrapper=False` by BenjaminBossan in https://github.com/huggingface/accelerate/pull/1838
* Fix `verify_device_map` by Rexhaif in https://github.com/huggingface/accelerate/pull/1842
* Change CUDA check by muellerzr in https://github.com/huggingface/accelerate/pull/1833
* Fix the noneffective parameter: `gpu_ids` (Rel. Issue 1848) by devymex in https://github.com/huggingface/accelerate/pull/1850
* support for ram efficient loading of model with FSDP by pacman100 in https://github.com/huggingface/accelerate/pull/1777
* Loading logic safetensors by SunMarc in https://github.com/huggingface/accelerate/pull/1853
* fix dispatch for quantized model by SunMarc in https://github.com/huggingface/accelerate/pull/1855
* Update `fsdp_with_peak_mem_tracking`.py by pacman100 in https://github.com/huggingface/accelerate/pull/1856
* Add env variable for `init_on_device` by shingjan in https://github.com/huggingface/accelerate/pull/1852
* remove casting to FP32 when saving state dict by pacman100 in https://github.com/huggingface/accelerate/pull/1868
* support custom slice function in `DataLoaderDispatcher` by thevasudevgupta in https://github.com/huggingface/accelerate/pull/1846
* Include a note to the forums in the bug report by muellerzr in https://github.com/huggingface/accelerate/pull/1871

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* yuxinyuan
* Support wrapping multiple models in `Accelerator.accumulate()` (1708)
* Fix errors when optimizer is not a Pytorch optimizer. (1733)
* Get rid of calling get_scale() by patching the step method of optimizer. (1720)
* NouamaneTazi
* Better control over DDP's `no_sync` (1726)
* abhilash1910
* Add FSDP for XPU (1803)
* Ipex bug fix for device properties in modelling (1834)
* statelesshz
* Add FSDP for NPU (1806)
* fix failing test on 8GPU (1724)
* fix the bug in npu (1728)
* thevasudevgupta
* support custom slice function in `DataLoaderDispatcher` (1846)

**Full Changelog**: https://github.com/huggingface/accelerate/compare/v0.21.0...v0.22.0

0.21.0

Model quantization with bitsandbytes

You can now quantize any model (no just Transformer models) using Accelerate. This is mainly for models having a lot of linear layers. See the [documentation](https://huggingface.co/docs/accelerate/usage_guides/quantization) for more information!

* Bnb quantization by SunMarc in 1626

Support for Ascend NPUs

Accelerate now supports Ascend NPUs.

* Add Ascend NPU accelerator support by statelesshz in 1676

What's new?

Accelerate now requires Python 3.8+ and PyTorch 1.10+ :

* 🚨🚨🚨 Spring cleaning: Python 3.8 🚨🚨🚨 by muellerzr in 1661
* 🚨🚨🚨 Spring cleaning: PyTorch 1.10 🚨🚨🚨 by muellerzr in 1662


* [doc build] Use secrets by mishig25 in 1551
* Update launch.mdx by LiamSwayne in 1553
* Avoid double wrapping of all accelerate.prepare objects by muellerzr in 1555
* Update README.md by LiamSwayne in 1556
* Fix load_state_dict when there is one device and disk by sgugger in 1557
* Fix tests not being ran on multi-GPU nightly by muellerzr in 1558
* fix the typo when setting the "_accelerator_prepared" attribute by Yura52 in 1560
* [`core`] Fix possibility to pass`NoneType` objects in `prepare` by younesbelkada in 1561
* Reset dataloader end_of_datalaoder at each iter by sgugger in 1562
* Update big_modeling.mdx by LiamSwayne in 1564
* [`bnb`] Fix failing int8 tests by younesbelkada in 1567
* Update gradient sync docs to reflect importance of `optimizer.step()` by dleve123 in 1565
* Update mixed precision integrations in README by sgugger in 1569
* Raise error instead of warn by muellerzr in 1568
* Introduce listify, fix tensorboard silently failing by muellerzr in 1570
* Check for bak and expand docs on directory structure by muellerzr in 1571
* Perminant solution by muellerzr in 1577
* fix the bug in xpu by mingxiaoh in 1508
* Make sure that we only set is_accelerator_prepared on items accelerate actually prepares by muellerzr in 1578
* Expand `prepare()` doc by muellerzr in 1580
* Get Torch version using importlib instead of pkg_resources by catwell in 1585
* improve oob performance when use mpirun to start DDP finetune without `accelerate launch` by sywangyi in 1575
* Update training_tpu.mdx by LiamSwayne in 1582
* Return false if CUDA available by muellerzr in 1581
* fix logger level by caopulan in 1579
* Fix test by muellerzr in 1586
* Update checkpoint.mdx by LiamSwayne in 1587
* FSDP updates by pacman100 in 1576
* Update modeling.py by ain-soph in 1595
* Integration tests by muellerzr in 1593
* Add triggers for CI workflow by muellerzr in 1597
* Remove asking xpu plugin for non xpu devices by abhilash1910 in 1594
* Remove GPU safetensors env variable by sgugger in 1603
* reset end_of_dataloader for dataloader_dispatcher by megavaz in 1609
* fix for arc gpus by abhilash1910 in 1615
* Ignore low_zero option when only device is available by sgugger in 1617
* Fix failing multinode tests by muellerzr in 1616
* Doc to md by sgugger in 1618
* Fix tb issue by muellerzr in 1623
* Fix workflow by muellerzr in 1625
* Fix transformers sync bug with accumulate by muellerzr in 1624
* fixes offload dtype by SunMarc in 1631
* fix: Megatron is not installed. please build it from source. by yuanwu2017 in 1636
* deepspeed z2/z1 state_dict bloating fix by pacman100 in 1638
* Swap disable rich by muellerzr in 1640
* fix autocasting bug by pacman100 in 1637
* fix modeling low zero by abhilash1910 in 1634
* Add skorch to runners by muellerzr in 1646
* add save model by SunMarc in 1641
* Change dispatch_model when we have only one device by SunMarc in 1648
* Doc save model by SunMarc in 1650
* Fix device_map by SunMarc in 1651
* Check for port usage before launch by muellerzr in 1656
* [`BigModeling`] Add missing check for quantized models by younesbelkada in 1652
* Bump integration by muellerzr in 1658
* TIL by muellerzr in 1657
* docker cpu py version by muellerzr in 1659
* [`BigModeling`] Final fix for dispatch int8 and fp4 models by younesbelkada in 1660
* remove safetensor dep on shard_checkpoint by SunMarc in 1664
* change the import place to avoid import error by pacman100 in 1653
* Update broken Runhouse link in examples/README.md by dongreenberg in 1668
* Bnb quantization by SunMarc in 1626
* replace save funct in doc by SunMarc in 1672
* Doc big model inference by SunMarc in 1670
* Add docs for saving Transformers models by deppen8 in 1671
* fix bnb tests by SunMarc in 1679
* Fix workflow CI by muellerzr in 1690
* remove duplicate class by SunMarc in 1691
* update readme in examples by statelesshz in 1678
* Fix nightly tests by muellerzr in 1696
* Fixup docs by muellerzr in 1697
* Improve quality errors by muellerzr in 1698
* Move mixed precision wrapping ahead of DDP/FSDP wrapping by ChenWu98 in 1682
* Add offload for 8-bit model by SunMarc in 1699
* Deepcopy on Accelerator to return self by muellerzr in 1694
* Update tracking.md by stevhliu in 1702
* Skip tests when bnb isn't available by muellerzr in 1706
* Fix launcher validation by abhilash1910 in 1705
* Fixes for issue 1683: failed to run accelerate config in colab by Erickrus in 1692
* Fix the bug where DataLoaderDispatcher gets stuck in an infinite wait when the dataset is an IterDataPipe during multi-process training. by yuxinyuan in 1709
* add multi_gpu decorator by SunMarc in 1712
* Modify loading checkpoint behavior by SunMarc in 1715
* fix version by SunMarc in 1701
* Keep old behavior by muellerzr in 1716
* Optimize `get_scale` to reduce async calls by muellerzr in 1718
* Remove duplicate code by muellerzr in 1717
* New tactic by muellerzr in 1719
* add Comfy-UI by pacman100 in 1723
* add compatibility with peft by SunMarc in 1725

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* LiamSwayne
* Update launch.mdx (1553)
* Update README.md (1556)
* Update big_modeling.mdx (1564)
* Update training_tpu.mdx (1582)
* Update checkpoint.mdx (1587)
* mingxiaoh
* fix the bug in xpu (1508)
* statelesshz
* update readme in examples (1678)
* Add Ascend NPU accelerator support (1676)
* ChenWu98
* Move mixed precision wrapping ahead of DDP/FSDP wrapping (1682)

0.20.3

- Reset dataloader end_of_datalaoder at each iter in 1562 by sgugger

0.20.2

- fix the typo when setting the "_accelerator_prepared" attribute in 1560 by Yura52
- [core] Fix possibility to pass] `NoneType` objects in `prepare` in 1561 by younesbelkada

0.20.1

- Avoid double wrapping of all accelerate.prepare objects by muellerzr in 1555
- Fix load_state_dict when there is one device and disk by sgugger in 1557

0.20.0

Big model inference

Support has been added to run `device_map="auto"` on the MPS device. Big model inference also work with models loaded in 4 bits in Transformers.

* Add mps support to big inference modeling by SunMarc in 1545
* Adds fp4 support for model dispatching by younesbelkada in 1505

4-bit QLoRA Support

* 4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) by TimDettmers in 1458

Distributed Inference Utilities

This version introduces a new `Accelerator.split_between_processes` utility to help with performing distributed infernece with non-tensorized or non-dataloader workflows. Read more [here](https://huggingface.co/docs/accelerate/usage_guides/distributed_inference)

Introduce XPU support for Intel GPU

* Intel GPU support initialization by abhilash1910 in 1118

Add support for the new PyTorch XLA TPU runtime

* Accelerate now supports the latest TPU runtimes 1393, 1385

A new optimizer method: `LocalSGD`

* This is a new wrapper around SGD which enables efficient multi-GPU training in the case when no fast interconnect is possible by searchivarius in 1378

Papers with 🤗 Accelerate

* We now have an entire section of the docs dedicated to official paper implementations and citations using the framework 1399, see it live [here](https://hf.co/docs/accelerate/usage_guides/training_zoo#in-science)

Breaking changes

`logging_dir` has been fully deprecated, please use `project_dir` or a `Project_configuration`

What's new?

* use existing mlflow experiment if exists by Rusteam in 1403
* changes required for DS integration by pacman100 in 1406
* fix deepspeed failing tests by pacman100 in 1411
* Make mlflow logging dir optional by mattplo-decath in 1413
* Fix bug on ipex for diffusers by abhilash1910 in 1426
* Improve Slack Updater by muellerzr in 1433
* Let quality yell at the user if it's a version difference by muellerzr in 1438
* Ensure that it gets installed by muellerzr in 1439
* [`core`] Introducing `CustomDtype` enum for custom dtypes by younesbelkada in 1434
* Fix XPU by muellerzr in 1440
* Make sure torch compiled model can also be unwrapped by patrickvonplaten in 1437
* fixed: ZeroDivisionError: division by zero by sreio in 1436
* fix potential OOM when resuming with multi-GPU training by exhyy in 1444
* Fixes in infer_auto_device_map by sgugger in 1441
* Raise error when logging improperly by muellerzr in 1446
* Fix ci by muellerzr in 1447
* Distributed prompting/inference utility by muellerzr in 1410
* Add to by muellerzr in 1448
* split_between_processes by stevhliu in 1449
* [docs] Replace `state.rank` -> `process_index` by pcuenca in 1450
* Auto multigpu logic by muellerzr in 1452
* Update with cli instructions by muellerzr in 1453
* Adds `in_order` argument that defaults to False, to log in order. by JulesGM in 1262
* fix error for CPU DDP using trainer api. by sywangyi in 1455
* Refactor and simplify xpu device in state by abhilash1910 in 1456
* Document how to use commands with python module instead of argparse by muellerzr in 1457
* 4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) by TimDettmers in 1458
* Fix skip first batch being perminant by muellerzr in 1466
* update conversion of layers to retain original data type. by avisinghal6 in 1467
* Check for xpu specifically by muellerzr in 1472
* update `register_empty_buffer` to match torch args by NouamaneTazi in 1465
* Update gradient accumulation docs, and remove redundant example by iantbutler01 in 1461
* Imrpove sagemaker by muellerzr in 1470
* Split tensors as part of `split_between_processes` by muellerzr in 1477
* Move to device by muellerzr in 1478
* Fix gradient state bugs in multiple dataloader by Ethan-yt in 1483
* Add rdzv-backend by muellerzr in 1490
* Only use IPEX if available by muellerzr in 1495
* Update README.md by lyhue1991 in 1493
* Let gather_for_metrics always run by muellerzr in 1496
* Use empty like when we only need to create buffers by thomasw21 in 1497
* Allow key skipping in big model inference by sgugger in 1491
* fix crash when ipex is installed and torch has no xpu by sywangyi in 1502
* [`bnb`] Add fp4 support for dispatch by younesbelkada in 1505
* Fix 4bit model on multiple devices by SunMarc in 1506
* adjust overriding of model's forward function by prathikr in 1492
* Add assertion when call prepare with deepspeed config. by tensimiku in 1468
* NVME path support for deepspeed by abhilash1910 in 1484
* should set correct dtype to ipex optimize and use amp logic in native… by sywangyi in 1511
* Swap env vars for XPU and IPEX + CLI by muellerzr in 1513
* Fix a bug when parameters tied belong to the same module by sgugger in 1514
* Fixup deepspeed/cli tests by muellerzr in 1526
* Refactor mp into its own wrapper by muellerzr in 1527
* Check tied parameters by SunMarc in 1529
* Raise ValueError on iterable dataset if we've hit the end and attempting to go beyond it by muellerzr in 1531
* Officially support naive PP for quantized models + PEFT by younesbelkada in 1523
* remove ipexplugin, let ACCELERATE_USE_IPEX/ACCELERATE_USE_XPU control the ipex and xpu by sywangyi in 1503
* Prevent using extra VRAM for static device_map by LSerranoPEReN in 1536
* Update deepspeed.mdx by LiamSwayne in 1541
* Update performance.mdx by LiamSwayne in 1543
* Update deferring_execution.mdx by LiamSwayne in 1544
* Apply deprecations by muellerzr in 1537
* Add mps support to big inference modeling by SunMarc in 1545
* [documentation] grammar fixes in gradient_synchronization.mdx by LiamSwayne in 1547
* Eval mode by muellerzr in 1540
* Update migration.mdx by LiamSwayne in 1549

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* will-cromar
* Support TPU v4 with new PyTorch/XLA TPU runtime (1393)
* Support TPU v2 and v3 on new PyTorch/XLA TPU runtime (1385)
* searchivarius
* Adding support for local SGD. (1378)
* abhilash1910
* Intel GPU support initialization (1118)
* Fix bug on ipex for diffusers (1426)
* Refactor and simplify xpu device in state (1456)
* NVME path support for deepspeed (1484)
* sywangyi
* fix error for CPU DDP using trainer api. (1455)
* fix crash when ipex is installed and torch has no xpu (1502)
* should set correct dtype to ipex optimize and use amp logic in native… (1511)
* remove ipexplugin, let ACCELERATE_USE_IPEX/ACCELERATE_USE_XPU control the ipex and xpu (1503)
* Ethan-yt
* Fix gradient state bugs in multiple dataloader (1483)

Page 6 of 18

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.