Accelerate

Latest version: v1.5.2

Safety actively analyzes 722032 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 8 of 18

0.14.0

Megatron LM integration

Accelerate now supports Megatron-LM for the three model classes (BERT, GPT-2 and T5). You can learn more in the [documentation](https://huggingface.co/docs/accelerate/usage_guides/megatron_lm).

* Megatron-LM integration by pacman100 in 667
* ensure megatron is 2.2.0+ by jeffra in 755
* updating docs to use fork of megatron-lm and minor example/docs fix by pacman100 in 766
* adding support to return logits and generate for Megatron-LM GPT models by pacman100 in 819

PyTorch 1.13 support

Fixes a bug that returned SIGKILL errors on Windows.

* Isolate distrib_run by muellerzr in 828

Kaggle support with the `notebook_launcher`

With Kaggle now giving instances with two T4 GPUs, Accelerate can leverage this to do multi-gpu training from the notebook

* Work in kaggle! by muellerzr in 783

What's new?

* Add `non_blocking` kwarg to `send_to_device()` by NouamaneTazi in 607
* [ds launcher] un-hijack PYTHONPATH by stas00 in 741
* Fix num_processes is not defined by muellerzr in 746
* [Device map] nn.Parameter don't have children by patrickvonplaten in 747
* Use HTML relative paths for tiles by lewtun in 749
* Add gpu_ids to SageMakerConfig though it should never be set by muellerzr in 751
* Change num_cpu_threads_per_process default by muellerzr in 753
* Return unclipped gradient from grad_clip_norm_ by samuelstevens in 756
* refactor by pacman100 in 758
* update docs by pacman100 in 759
* Only wrap modules in DDP if they require grad by samuelstevens in 761
* Move io_same_device hook to before attach_align_device hook on cpu_offload and disk_offload. by piEsposito in 768
* Regression cli tests by muellerzr in 772
* Fix number of devices in get_balanced_memory by sgugger in 774
* Fix all github actions issues + depreciations by muellerzr in 773
* Fix flakey wandb test by muellerzr in 775
* Add defaults for launchers by muellerzr in 778
* Allow BatchSamplerShard to not even out batches by sgugger in 776
* Make rich toggleable and seperate out a new environment utility file by muellerzr in 779
* Add same_network + docs by muellerzr in 780
* fix transformers tests by ArthurZucker in 777
* Add Dev Container configuration by Chris-hughes10 in 782
* separate dataloader generator from sampler generator by pacman100 in 789
* Consider top-level buffers when computing `infer_auto_device_map` by younesbelkada in 792
* Add `even_batches` keyword to Accelerator by Chris-hughes10 in 781
* Fix device_map="auto" on CPU-only envs by sgugger in 797
* Fix extraction of state dict in offload by sgugger in 795
* fix: add pdsh as default launcher by zanussbaum in 800
* Deal with optimizer.differentiable in PyTorch 1.13.0 by comaniac in 803
* Introduce a pod-config command by muellerzr in 802
* Refactor CLI to improve readability by muellerzr in 810
* adding support to pickle and unpickle `AcceleratedOptimizer` by pacman100 in 811
* add `recurse` argument in `remove_hook_from_module` by younesbelkada in 812
* Act on deprecations by muellerzr in 813
* Mlflow-tracker-v2 🔥 by nbroad1881 in 794
* Update CLI docs and use mps rather than mps_device by muellerzr in 814
* Rename pod-config to tpu-config + docs by muellerzr in 818
* Update docs by muellerzr in 823
* rename sklearn to proper dep by muellerzr in 825
* Rename by muellerzr in 824
* Update pr docs actions by mishig25 in 827

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* Chris-hughes10
* Add Dev Container configuration (782)
* Add `even_batches` keyword to Accelerator (781)

0.13.2

- [Device map] nn.Parameter don't have children in 747 by patrickvonplaten

0.13.1

- Fix num_processes is not defined 746 by muellerzr

0.13.0

Better multinode support in the launcher

The `accelerate command` launch did not work well for distributed training using several machines. This is fixed in this version.

* Use torchrun for multinode by muellerzr in 631
* Fix multi-node issues from launch by muellerzr in 672

Launch training on specific GPUs only

Instead of prefixing your launch command with `CUDA_VISIBLE_DEVICES=xxx` you can now specify the GPUs you want to use in your Accelerate config.

* Allow for GPU-ID specification on CLI by muellerzr in 732

Better tracebacks and rich support

The tracebacks are now cleaned up to avoid printing several times the same error, and rich is integrated as an optional dependency.

* Integrate Rich into Accelerate by muellerzr in 613
* Make rich an optional dep by muellerzr in 673

What's new?

* Fix typo in docs/index.mdx by mishig25 in 610
* Fix DeepSpeed CI by muellerzr in 612
* Added GANs example to examples by EyalMichaeli in 619
* Fix example by muellerzr in 620
* Update README.md by ezhang7423 in 622
* Fully remove `subprocess` from the multi-gpu launcher by muellerzr in 623
* M1 mps fixes by pacman100 in 625
* Fix multi-node issues and simplify param logic by muellerzr in 627
* update MPS support docs by pacman100 in 629
* minor tracker fixes for complete* examples by pacman100 in 630
* Put back in place the guard by muellerzr in 634
* make init_trackers to launch on main process by Gladiator07 in 642
* remove check for main process for trackers initialization by Gladiator07 in 643
* fix link by philschmid in 645
* Add static_graph arg to DistributedDataParallelKwargs. by rom1504 in 637
* Small nits to grad accum docs by muellerzr in 656
* Saving hyperparams in yaml file for Tensorboard for 521 by Shreyz-max in 657
* Use debug for loggers by muellerzr in 655
* Improve docstrings more by muellerzr in 666
* accelerate bibtex by pacman100 in 660
* Cache torch_tpu check by muellerzr in 670
* Manim animation of big model inference by muellerzr in 671
* Add aim tracker for accelerate by muellerzr in 649
* Specify local network on multinode by muellerzr in 674
* Test for min torch version + fix all issues by muellerzr in 638
* deepspeed enhancements and fixes by pacman100 in 676
* DeepSpeed launcher related changes by pacman100 in 626
* adding torchrun elastic params by pacman100 in 680
* :bug: fix by pacman100 in 683
* Fix skip in dispatch dataloaders by sgugger in 682
* Clean up DispatchDataloader a bit more by sgugger in 686
* rng state sync for FSDP by pacman100 in 688
* Fix DataLoader with samplers that are batch samplers by sgugger in 687
* fixing support for Apple Silicon GPU in `notebook_launcher` by pacman100 in 695
* fixing rng sync when using custom sampler and batch_sampler by pacman100 in 696
* Improve `init_empty_weights` to override tensor constructor by thomasw21 in 699
* override DeepSpeed `grad_acc_steps` from `accelerator` obj by pacman100 in 698
* [doc] Fix 404'd link in memory usage guides by tomaarsen in 702
* Add in report generation for test failures and make fail-fast false by muellerzr in 703
* Update runners with report structure, adjust env variable by muellerzr in 704
* docs: examples readability improvements by ryanrussell in 709
* docs: `utils` readability fixups by ryanrussell in 711
* refactor(test_tracking): `key_occurrence` readability fixup by ryanrussell in 710
* docs: `hooks` readability improvements by ryanrussell in 712
* sagemaker fixes and improvements by pacman100 in 708
* refactor(accelerate): readability improvements by ryanrussell in 713
* More docstring nits by muellerzr in 715
* Allow custom device placements for different objects by sgugger in 716
* Specify gradients in model preparation by muellerzr in 722
* Fix regression issue by muellerzr in 724
* Fix default for num processes by sgugger in 726
* Build and Release docker images on a release by muellerzr in 725
* Make running tests more efficient by muellerzr in 611
* Fix old naming by muellerzr in 727
* Fix issue with one-cycle logic by muellerzr in 728
* Remove auto-bug label in issue template by sgugger in 735
* Add a tutorial on proper benchmarking by muellerzr in 734
* Add an example zoo to the documentation by muellerzr in 737
* trlx by muellerzr in 738
* Fix memory leak by muellerzr in 739
* Include examples for CI by muellerzr in 740
* Auto grad accum example by muellerzr in 742

0.12.0

New documentation

The whole documentation has been revamped, just go look at it [here](https://huggingface.co/docs/accelerate)!

* Complete revamp of the docs by muellerzr in 495


New gather_for_metrics method

When doing distributed evaluation, the dataloader loops back at the beginning of the dataset to make batches that have a round multiple of the number of processes. This causes the predictions to be slightly bigger than the length of the dataset, which used to require some truncating. This is all done behind the scenes now if you replace the `gather` your did in evaluation by `gather_for_metrics`.

* Reenable Gather for Metrics by muellerzr in 590
* Fix gather_for_metrics by muellerzr in 578
* Add a gather_for_metrics capability by muellerzr in 540

Balanced device maps

When loading big models for inference, `device_map="auto"` used to fill the GPUs sequentially, making it hard to use a batch size > 1. It now balances the weights evenly on the GPUs so if you have more GPU space than the model size, you can do predictions with a bigger batch size!

M1 GPU support

Accelerate now supports M1 GPUs, to learn more about how to setup your environment, see the [documentation](https://huggingface.co/docs/accelerate/v0.12.0/en/usage_guides/mps#accelerated-pytorch-training-on-mac).

* M1 GPU `mps` device integration by pacman100 in 596

What's new?

* Small fixed for balanced device maps by sgugger in 583
* Add balanced option for auto device map creation by sgugger in 534
* fixing deepspeed slow tests issue by pacman100 in 604
* add more conditions on casting by younesbelkada in 606
* Remove redundant `.run` in `WandBTracker`. by zh-plus in 605
* Fix some typos + wordings by muellerzr in 603
* reorg of test scripts and minor changes to tests by pacman100 in 602
* Move warning by muellerzr in 598
* Shorthand way to grab a tracker by muellerzr in 594
* Pin deepspeed by muellerzr in 595
* Improve docstring by muellerzr in 591
* TESTS! by muellerzr in 589
* Fix DispatchDataloader by sgugger in 588
* Use main_process_first in the examples by muellerzr in 581
* Skip and raise NotImplementedError for gather_for_metrics for now by muellerzr in 580
* minor FSDP launcher fix by pacman100 in 579
* Refine test in set_module_tensor_to_device by sgugger in 577
* Fix `set_module_tensor_to_device` by sgugger in 576
* Add 8 bit support - chapter II by younesbelkada in 539
* Fix tests, add wandb to gitignore by muellerzr in 573
* Fix step by muellerzr in 572
* Speed up main CI by muellerzr in 571
* ccl version check and import different module according to version by sywangyi in 567
* set default num_cpu_threads_per_process to improve oob performance by sywangyi in 562
* Add a tqdm helper by muellerzr in 564
* Rename actions to be a bit more accurate by muellerzr in 568
* Fix clean by muellerzr in 569
* enhancements and fixes for FSDP and DeepSpeed by pacman100 in 532
* fix: saving model weights by csarron in 556
* add on_main_process decorators by ZhiyuanChen in 488
* Update imports.py by KimBioInfoStudio in 554
* unpin `datasets` by lhoestq in 563
* Create good defaults in `accelerate launch` by muellerzr in 553
* Fix a few minor issues with example code in docs by BenjaminBossan in 551
* deepspeed version `0.6.7` fix by pacman100 in 544
* Rename test extras to testing by muellerzr in 545
* Add production testing + fix failing CI by muellerzr in 547
* Add a gather_for_metrics capability by muellerzr in 540
* Allow for kwargs to be passed to trackers by muellerzr in 542
* Add support for downcasting bf16 on TPUs by muellerzr in 523
* Add more documentation for device maps computations by sgugger in 530
* Restyle prepare one by muellerzr in 531
* Pick a better default for offload_state_dict by sgugger in 529
* fix some parameter setting does not work for CPU DDP and bf16 fail in… by sywangyi in 527
* Fix accelerate tests command by sgugger in 528

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* sywangyi
* ccl version check and import different module according to version (567)
* set default num_cpu_threads_per_process to improve oob performance (562)
* fix some parameter setting does not work for CPU DDP and bf16 fail in… (527)
* ZhiyuanChen
* add on_main_process decorators (488)

0.11.0

Gradient Accumulation

Accelerate now handles gradient accumulation if you want, just pass along `gradient_accumulation_steps=xxx` when instantiating the `Accelerator` and put all your training loop step under a `with accelerator.accumulate(model):`. Accelerate will then handle the loss re-scaling and gradient accumulation for you (avoiding slowdowns in distributed training when gradients only need to be synced when you want to step). More details in the [documentation](https://huggingface.co/docs/accelerate/gradient_accumulation#letting-accelerate-handle-gradient-accumulation).

* Add gradient accumulation doc by muellerzr in 511
* Make gradient accumulation work with dispatched dataloaders by muellerzr in 510
* Introduce automatic gradient accumulation wrapper + fix a few test issues by muellerzr in 484

Support for SageMaker Data parallelism

Accelerate now support SageMaker specific brand of data parallelism.

* SageMaker enhancements to allow custom docker image, input channels referring to s3/remote data locations and metrics logging by pacman100 in 504
* SageMaker DP Support by pacman100 in 494

What's new?

* Fix accelerate tests command by sgugger in 528
* FSDP integration enhancements and fixes by pacman100 in 522
* Warn user if no trackers are installed by muellerzr in 524
* Fixup all example CI tests and properly fail by muellerzr in 517
* fixing deepspeed multi-node launcher by pacman100 in 514
* Add special Parameters modules support by younesbelkada in 519
* Don't unwrap in save_state() by cccntu in 489
* Fix a bug when reduce a tensor. by wwhio in 513
* Add benchmarks by sgugger in 506
* Fix DispatchDataLoader length when `split_batches=True` by sgugger in 509
* Fix scheduler in gradient accumulation example by muellerzr in 500
* update dataloader wrappers to have `total_batch_size` attribute by pacman100 in 493
* Introduce automatic gradient accumulation wrapper + fix a few test issues by muellerzr in 484
* add use_distributed property by ZhiyuanChen in 487
* fixing fsdp autowrap functionality by pacman100 in 475
* Use datasets 2.2.0 for now by muellerzr in 481
* Rm gradient accumulation on TPU by muellerzr in 479
* Revert "Pin datasets for now by muellerzr in 477)"
* Pin datasets for now by muellerzr in 477
* Some typos and cosmetic fixes by douwekiela in 472
* Fix when TPU device check is ran by muellerzr in 469
* Refactor Utility Documentation by muellerzr in 467
* Add docbuilder to quality by muellerzr in 468
* Expose some is_*_available utils in docs by muellerzr in 466
* Cleanup CI Warnings by muellerzr in 465
* Link CI slow runners to the commit by muellerzr in 464
* Fix subtle bug in BF16 by muellerzr in 463
* Include bf16 support for TPUs and CPUs, and a better check for if a CUDA device supports BF16 by muellerzr in 462
* Handle bfloat16 weights in disk offload without adding memory overhead by noamwies in 460)
* Handle bfloat16 weights in disk offload by sgugger in 460
* Raise a clear warning if a user tries to modify the AcceleratorState by muellerzr in 458
* Right step point by muellerzr in 459
* Better checks for if a TPU device exists by muellerzr in 456
* Offload and modules with unused submodules by sgugger in 442

Page 8 of 18

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.