Accelerate

Latest version: v1.1.1

Safety actively analyzes 681812 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 7 of 16

0.13.2

- [Device map] nn.Parameter don't have children in 747 by patrickvonplaten

0.13.1

- Fix num_processes is not defined 746 by muellerzr

0.13.0

Better multinode support in the launcher

The `accelerate command` launch did not work well for distributed training using several machines. This is fixed in this version.

* Use torchrun for multinode by muellerzr in 631
* Fix multi-node issues from launch by muellerzr in 672

Launch training on specific GPUs only

Instead of prefixing your launch command with `CUDA_VISIBLE_DEVICES=xxx` you can now specify the GPUs you want to use in your Accelerate config.

* Allow for GPU-ID specification on CLI by muellerzr in 732

Better tracebacks and rich support

The tracebacks are now cleaned up to avoid printing several times the same error, and rich is integrated as an optional dependency.

* Integrate Rich into Accelerate by muellerzr in 613
* Make rich an optional dep by muellerzr in 673

What's new?

* Fix typo in docs/index.mdx by mishig25 in 610
* Fix DeepSpeed CI by muellerzr in 612
* Added GANs example to examples by EyalMichaeli in 619
* Fix example by muellerzr in 620
* Update README.md by ezhang7423 in 622
* Fully remove `subprocess` from the multi-gpu launcher by muellerzr in 623
* M1 mps fixes by pacman100 in 625
* Fix multi-node issues and simplify param logic by muellerzr in 627
* update MPS support docs by pacman100 in 629
* minor tracker fixes for complete* examples by pacman100 in 630
* Put back in place the guard by muellerzr in 634
* make init_trackers to launch on main process by Gladiator07 in 642
* remove check for main process for trackers initialization by Gladiator07 in 643
* fix link by philschmid in 645
* Add static_graph arg to DistributedDataParallelKwargs. by rom1504 in 637
* Small nits to grad accum docs by muellerzr in 656
* Saving hyperparams in yaml file for Tensorboard for 521 by Shreyz-max in 657
* Use debug for loggers by muellerzr in 655
* Improve docstrings more by muellerzr in 666
* accelerate bibtex by pacman100 in 660
* Cache torch_tpu check by muellerzr in 670
* Manim animation of big model inference by muellerzr in 671
* Add aim tracker for accelerate by muellerzr in 649
* Specify local network on multinode by muellerzr in 674
* Test for min torch version + fix all issues by muellerzr in 638
* deepspeed enhancements and fixes by pacman100 in 676
* DeepSpeed launcher related changes by pacman100 in 626
* adding torchrun elastic params by pacman100 in 680
* :bug: fix by pacman100 in 683
* Fix skip in dispatch dataloaders by sgugger in 682
* Clean up DispatchDataloader a bit more by sgugger in 686
* rng state sync for FSDP by pacman100 in 688
* Fix DataLoader with samplers that are batch samplers by sgugger in 687
* fixing support for Apple Silicon GPU in `notebook_launcher` by pacman100 in 695
* fixing rng sync when using custom sampler and batch_sampler by pacman100 in 696
* Improve `init_empty_weights` to override tensor constructor by thomasw21 in 699
* override DeepSpeed `grad_acc_steps` from `accelerator` obj by pacman100 in 698
* [doc] Fix 404'd link in memory usage guides by tomaarsen in 702
* Add in report generation for test failures and make fail-fast false by muellerzr in 703
* Update runners with report structure, adjust env variable by muellerzr in 704
* docs: examples readability improvements by ryanrussell in 709
* docs: `utils` readability fixups by ryanrussell in 711
* refactor(test_tracking): `key_occurrence` readability fixup by ryanrussell in 710
* docs: `hooks` readability improvements by ryanrussell in 712
* sagemaker fixes and improvements by pacman100 in 708
* refactor(accelerate): readability improvements by ryanrussell in 713
* More docstring nits by muellerzr in 715
* Allow custom device placements for different objects by sgugger in 716
* Specify gradients in model preparation by muellerzr in 722
* Fix regression issue by muellerzr in 724
* Fix default for num processes by sgugger in 726
* Build and Release docker images on a release by muellerzr in 725
* Make running tests more efficient by muellerzr in 611
* Fix old naming by muellerzr in 727
* Fix issue with one-cycle logic by muellerzr in 728
* Remove auto-bug label in issue template by sgugger in 735
* Add a tutorial on proper benchmarking by muellerzr in 734
* Add an example zoo to the documentation by muellerzr in 737
* trlx by muellerzr in 738
* Fix memory leak by muellerzr in 739
* Include examples for CI by muellerzr in 740
* Auto grad accum example by muellerzr in 742

0.12.0

New documentation

The whole documentation has been revamped, just go look at it [here](https://huggingface.co/docs/accelerate)!

* Complete revamp of the docs by muellerzr in 495


New gather_for_metrics method

When doing distributed evaluation, the dataloader loops back at the beginning of the dataset to make batches that have a round multiple of the number of processes. This causes the predictions to be slightly bigger than the length of the dataset, which used to require some truncating. This is all done behind the scenes now if you replace the `gather` your did in evaluation by `gather_for_metrics`.

* Reenable Gather for Metrics by muellerzr in 590
* Fix gather_for_metrics by muellerzr in 578
* Add a gather_for_metrics capability by muellerzr in 540

Balanced device maps

When loading big models for inference, `device_map="auto"` used to fill the GPUs sequentially, making it hard to use a batch size > 1. It now balances the weights evenly on the GPUs so if you have more GPU space than the model size, you can do predictions with a bigger batch size!

M1 GPU support

Accelerate now supports M1 GPUs, to learn more about how to setup your environment, see the [documentation](https://huggingface.co/docs/accelerate/v0.12.0/en/usage_guides/mps#accelerated-pytorch-training-on-mac).

* M1 GPU `mps` device integration by pacman100 in 596

What's new?

* Small fixed for balanced device maps by sgugger in 583
* Add balanced option for auto device map creation by sgugger in 534
* fixing deepspeed slow tests issue by pacman100 in 604
* add more conditions on casting by younesbelkada in 606
* Remove redundant `.run` in `WandBTracker`. by zh-plus in 605
* Fix some typos + wordings by muellerzr in 603
* reorg of test scripts and minor changes to tests by pacman100 in 602
* Move warning by muellerzr in 598
* Shorthand way to grab a tracker by muellerzr in 594
* Pin deepspeed by muellerzr in 595
* Improve docstring by muellerzr in 591
* TESTS! by muellerzr in 589
* Fix DispatchDataloader by sgugger in 588
* Use main_process_first in the examples by muellerzr in 581
* Skip and raise NotImplementedError for gather_for_metrics for now by muellerzr in 580
* minor FSDP launcher fix by pacman100 in 579
* Refine test in set_module_tensor_to_device by sgugger in 577
* Fix `set_module_tensor_to_device` by sgugger in 576
* Add 8 bit support - chapter II by younesbelkada in 539
* Fix tests, add wandb to gitignore by muellerzr in 573
* Fix step by muellerzr in 572
* Speed up main CI by muellerzr in 571
* ccl version check and import different module according to version by sywangyi in 567
* set default num_cpu_threads_per_process to improve oob performance by sywangyi in 562
* Add a tqdm helper by muellerzr in 564
* Rename actions to be a bit more accurate by muellerzr in 568
* Fix clean by muellerzr in 569
* enhancements and fixes for FSDP and DeepSpeed by pacman100 in 532
* fix: saving model weights by csarron in 556
* add on_main_process decorators by ZhiyuanChen in 488
* Update imports.py by KimBioInfoStudio in 554
* unpin `datasets` by lhoestq in 563
* Create good defaults in `accelerate launch` by muellerzr in 553
* Fix a few minor issues with example code in docs by BenjaminBossan in 551
* deepspeed version `0.6.7` fix by pacman100 in 544
* Rename test extras to testing by muellerzr in 545
* Add production testing + fix failing CI by muellerzr in 547
* Add a gather_for_metrics capability by muellerzr in 540
* Allow for kwargs to be passed to trackers by muellerzr in 542
* Add support for downcasting bf16 on TPUs by muellerzr in 523
* Add more documentation for device maps computations by sgugger in 530
* Restyle prepare one by muellerzr in 531
* Pick a better default for offload_state_dict by sgugger in 529
* fix some parameter setting does not work for CPU DDP and bf16 fail in… by sywangyi in 527
* Fix accelerate tests command by sgugger in 528

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* sywangyi
* ccl version check and import different module according to version (567)
* set default num_cpu_threads_per_process to improve oob performance (562)
* fix some parameter setting does not work for CPU DDP and bf16 fail in… (527)
* ZhiyuanChen
* add on_main_process decorators (488)

0.11.0

Gradient Accumulation

Accelerate now handles gradient accumulation if you want, just pass along `gradient_accumulation_steps=xxx` when instantiating the `Accelerator` and put all your training loop step under a `with accelerator.accumulate(model):`. Accelerate will then handle the loss re-scaling and gradient accumulation for you (avoiding slowdowns in distributed training when gradients only need to be synced when you want to step). More details in the [documentation](https://huggingface.co/docs/accelerate/gradient_accumulation#letting-accelerate-handle-gradient-accumulation).

* Add gradient accumulation doc by muellerzr in 511
* Make gradient accumulation work with dispatched dataloaders by muellerzr in 510
* Introduce automatic gradient accumulation wrapper + fix a few test issues by muellerzr in 484

Support for SageMaker Data parallelism

Accelerate now support SageMaker specific brand of data parallelism.

* SageMaker enhancements to allow custom docker image, input channels referring to s3/remote data locations and metrics logging by pacman100 in 504
* SageMaker DP Support by pacman100 in 494

What's new?

* Fix accelerate tests command by sgugger in 528
* FSDP integration enhancements and fixes by pacman100 in 522
* Warn user if no trackers are installed by muellerzr in 524
* Fixup all example CI tests and properly fail by muellerzr in 517
* fixing deepspeed multi-node launcher by pacman100 in 514
* Add special Parameters modules support by younesbelkada in 519
* Don't unwrap in save_state() by cccntu in 489
* Fix a bug when reduce a tensor. by wwhio in 513
* Add benchmarks by sgugger in 506
* Fix DispatchDataLoader length when `split_batches=True` by sgugger in 509
* Fix scheduler in gradient accumulation example by muellerzr in 500
* update dataloader wrappers to have `total_batch_size` attribute by pacman100 in 493
* Introduce automatic gradient accumulation wrapper + fix a few test issues by muellerzr in 484
* add use_distributed property by ZhiyuanChen in 487
* fixing fsdp autowrap functionality by pacman100 in 475
* Use datasets 2.2.0 for now by muellerzr in 481
* Rm gradient accumulation on TPU by muellerzr in 479
* Revert "Pin datasets for now by muellerzr in 477)"
* Pin datasets for now by muellerzr in 477
* Some typos and cosmetic fixes by douwekiela in 472
* Fix when TPU device check is ran by muellerzr in 469
* Refactor Utility Documentation by muellerzr in 467
* Add docbuilder to quality by muellerzr in 468
* Expose some is_*_available utils in docs by muellerzr in 466
* Cleanup CI Warnings by muellerzr in 465
* Link CI slow runners to the commit by muellerzr in 464
* Fix subtle bug in BF16 by muellerzr in 463
* Include bf16 support for TPUs and CPUs, and a better check for if a CUDA device supports BF16 by muellerzr in 462
* Handle bfloat16 weights in disk offload without adding memory overhead by noamwies in 460)
* Handle bfloat16 weights in disk offload by sgugger in 460
* Raise a clear warning if a user tries to modify the AcceleratorState by muellerzr in 458
* Right step point by muellerzr in 459
* Better checks for if a TPU device exists by muellerzr in 456
* Offload and modules with unused submodules by sgugger in 442

0.10.0

This release adds two major new features: the DeepSpeed integration has been revamped to match the one in Transformers Trainer, with multiple new options unlocked, and the TPU integration has been sped up.

This version also officially stops supporting Python 3.6 and requires Python 3.7+

DeepSpeed integration revamp

Users can now specify a DeepSpeed config file when they want to use DeepSpeed, which unlocks many new options. More details in the new [documentation](https://huggingface.co/docs/accelerate/deepspeed).

* Migrate HFDeepSpeedConfig from trfrs to accelerate by pacman100 in 432
* DeepSpeed Revamp by pacman100 in 405

TPU speedup

If you're using TPUs we have sped up the dataloaders and models quite a bit, on top of a few bug fixes.

* Revamp TPU internals to be more efficient + enable mixed precision types by muellerzr in 441

What's new?

* Fix docstring by muellerzr in 447
* Add psutil as depenedency by sgugger in 445
* fix fsdp torch version dependency by pacman100 in 437
* Create Gradient Accumulation Example by muellerzr in 431
* init by muellerzr in 429
* Introduce `no_sync` context wrapper + clean up some more warnings for DDP by muellerzr in 428
* updating tests to resolve runner failures wrt deepspeed revamp by pacman100 in 427
* Fix secrets in Docker workflow by muellerzr in 426
* Introduce a Dependency Checker to trigger new Docker Builds on main by muellerzr in 424
* Enable slow tests nightly by muellerzr in 421
* Push out python 3.6 + fix all tests related to the upgrade by muellerzr in 420
* Speedup main CI by muellerzr in 419
* Switch to evaluate for metrics by sgugger in 417
* Create an issue template for Accelerate by muellerzr in 415
* Introduce post-merge runners by muellerzr in 416
* Fix debug_launcher issues by muellerzr in 413
* Use main egg by muellerzr in 414
* Introduce nightly runners by muellerzr in 410
* Update requirements to pin tensorboard and include psutil by muellerzr in 408
* Fix CUDA examples tests by muellerzr in 407
* Move datasets and transformers to under func by muellerzr in 411
* Fix CUDA Dockerfile by muellerzr in 409
* Hotfix all failing GPU tests by muellerzr in 401
* improve metrics logged in examples by pacman100 in 399
* Refactor offload_state_dict and fix in offload_weight by sgugger in 398
* Refactor version checking into a utility by muellerzr in 395
* Include fastai in frameworks by muellerzr in 396
* Add packaging to requirements by muellerzr in 394
* Better dispatch for submodules by sgugger in 392
* Build Docker Images nightly by muellerzr in 391
* Small bugfix for the stalebot workflow by muellerzr in 390
* Introduce stalebot by muellerzr in 387
* Create Dockerfiles for Accelerate by muellerzr in 377
* Mix precision -> Mixed precision by muellerzr in 388
* Fix OneCycle step length when in multiprocess by muellerzr in 385

Page 7 of 16

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.