Core
* We've simplified the `tqdm` wrapper to make it fully passthrough, no need to have `tqdm(main_process_only, *args)`, it is now just `tqdm(*args)` and you can pass in `is_main_process` as a kwarg.
* We've added support for advanced optimizer usage:
* Schedule free optimizer introduced by [Meta](https://github.com/facebookresearch/schedule_free/tree/main) by muellerzr in https://github.com/huggingface/accelerate/pull/2631
* LOMO optimizer introduced by [OpenLMLab](https://github.com/OpenLMLab/LOMO) by younesbelkada in https://github.com/huggingface/accelerate/pull/2695
* Enable BF16 autocast to everything during FP8 and enable FSDP by muellerzr in https://github.com/huggingface/accelerate/pull/2655
* Support dataloader send_to_device calls to use non-blocking by drhead in https://github.com/huggingface/accelerate/pull/2685
* allow gather_for_metrics to be more flexible by SunMarc in https://github.com/huggingface/accelerate/pull/2710
* Add `cann` version info to command accelerate env for NPU by statelesshz in https://github.com/huggingface/accelerate/pull/2689
* Add MLU rng state setter by ArthurinRUC in https://github.com/huggingface/accelerate/pull/2664
* device agnostic testing for hooks&utils&big_modeling by statelesshz in https://github.com/huggingface/accelerate/pull/2602
Documentation
* Through collaboration between fabianlim (lead contribuitor), stas00, pacman100, and muellerzr we have a new concept guide out for FSDP and DeepSpeed explicitly detailing how each interop and explaining fully and clearly how each of those work. This was a momumental effort by fabianlim to ensure that everything can be as accurate as possible to users. I highly recommend visiting this new documentation, available [here](https://huggingface.co/docs/accelerate/concept_guides/fsdp_and_deepspeed)
* New distributed inference examples have been added thanks to SunMarc in https://github.com/huggingface/accelerate/pull/2672
* Fixed some docs for using internal trackers by brentyi in https://github.com/huggingface/accelerate/pull/2650
DeepSpeed
* Accelerate can now handle MoE models when using deepspeed, thanks to pacman100 in https://github.com/huggingface/accelerate/pull/2662
* Allow "auto" for gradient clipping in YAML by regisss in https://github.com/huggingface/accelerate/pull/2649
* Introduce a `deepspeed`-specific Docker image by muellerzr in https://github.com/huggingface/accelerate/pull/2707. To use, pull the `gpu-deepspeed` tag `docker pull huggingface/accelerate:cuda-deepspeed-nightly`
Megatron
* Megatron plugin can support NPU by zhangsheng377 in https://github.com/huggingface/accelerate/pull/2667
Big Modeling
* Add strict arg to load_checkpoint_and_dispatch by SunMarc in https://github.com/huggingface/accelerate/pull/2641
Bug Fixes
* Fix up state with xla + performance regression by muellerzr in https://github.com/huggingface/accelerate/pull/2634
* Parenthesis on xpu_available by muellerzr in https://github.com/huggingface/accelerate/pull/2639
* Fix `is_train_batch_min` type in DeepSpeedPlugin by yhna940 in https://github.com/huggingface/accelerate/pull/2646
* Fix backend check by jiqing-feng in https://github.com/huggingface/accelerate/pull/2652
* Fix the rng states of sampler's generator to be synchronized for correct sharding of dataset across GPUs by pacman100 in https://github.com/huggingface/accelerate/pull/2694
* Block AMP for MPS device by SunMarc in https://github.com/huggingface/accelerate/pull/2699
* Fixed issue when doing multi-gpu training with bnb when the first gpu is not used by SunMarc in https://github.com/huggingface/accelerate/pull/2714
* Fixup `free_memory` to deal with garbage collection by muellerzr in https://github.com/huggingface/accelerate/pull/2716
* Fix sampler serialization failing by SunMarc in https://github.com/huggingface/accelerate/pull/2723
* Fix deepspeed offload device type in the arguments to be more accurate by yhna940 in https://github.com/huggingface/accelerate/pull/2717
Full Changelog
* Schedule free optimizer support by muellerzr in https://github.com/huggingface/accelerate/pull/2631
* Fix up state with xla + performance regression by muellerzr in https://github.com/huggingface/accelerate/pull/2634
* Parenthesis on xpu_available by muellerzr in https://github.com/huggingface/accelerate/pull/2639
* add third-party device prefix to `execution_device` by faaany in https://github.com/huggingface/accelerate/pull/2612
* add strict arg to load_checkpoint_and_dispatch by SunMarc in https://github.com/huggingface/accelerate/pull/2641
* device agnostic testing for hooks&utils&big_modeling by statelesshz in https://github.com/huggingface/accelerate/pull/2602
* Docs fix for using internal trackers by brentyi in https://github.com/huggingface/accelerate/pull/2650
* Allow "auto" for gradient clipping in YAML by regisss in https://github.com/huggingface/accelerate/pull/2649
* Fix `is_train_batch_min` type in DeepSpeedPlugin by yhna940 in https://github.com/huggingface/accelerate/pull/2646
* Don't use deprecated `Repository` anymore by Wauplin in https://github.com/huggingface/accelerate/pull/2658
* Fix test_from_pretrained_low_cpu_mem_usage_measured failure by yuanwu2017 in https://github.com/huggingface/accelerate/pull/2644
* Add MLU rng state setter by ArthurinRUC in https://github.com/huggingface/accelerate/pull/2664
* fix backend check by jiqing-feng in https://github.com/huggingface/accelerate/pull/2652
* Megatron plugin can support NPU by zhangsheng377 in https://github.com/huggingface/accelerate/pull/2667
* Revert "fix backend check" by muellerzr in https://github.com/huggingface/accelerate/pull/2669
* `tqdm`: `*args` should come ahead of `main_process_only` by rb-synth in https://github.com/huggingface/accelerate/pull/2654
* Handle MoE models with DeepSpeed by pacman100 in https://github.com/huggingface/accelerate/pull/2662
* Fix deepspeed moe test with version check by pacman100 in https://github.com/huggingface/accelerate/pull/2677
* Pin DS...again.. by muellerzr in https://github.com/huggingface/accelerate/pull/2679
* fix backend check by jiqing-feng in https://github.com/huggingface/accelerate/pull/2670
* Deprecate tqdm args + slight logic tweaks by muellerzr in https://github.com/huggingface/accelerate/pull/2673
* Enable BF16 autocast to everything during FP8 + some tweaks to enable FSDP by muellerzr in https://github.com/huggingface/accelerate/pull/2655
* Fix the rng states of sampler's generator to be synchronized for correct sharding of dataset across GPUs by pacman100 in https://github.com/huggingface/accelerate/pull/2694
* Simplify test logic by pacman100 in https://github.com/huggingface/accelerate/pull/2697
* Add source code for DataLoader Animation by muellerzr in https://github.com/huggingface/accelerate/pull/2696
* Block AMP for MPS device by SunMarc in https://github.com/huggingface/accelerate/pull/2699
* Do a pip freeze during workflows by muellerzr in https://github.com/huggingface/accelerate/pull/2704
* add cann version info to command accelerate env by statelesshz in https://github.com/huggingface/accelerate/pull/2689
* Add version checks for the import of DeepSpeed moe utils by pacman100 in https://github.com/huggingface/accelerate/pull/2705
* Change dataloader send_to_device calls to non-blocking by drhead in https://github.com/huggingface/accelerate/pull/2685
* add distributed examples by SunMarc in https://github.com/huggingface/accelerate/pull/2672
* Add diffusers to req by muellerzr in https://github.com/huggingface/accelerate/pull/2711
* fix bnb multi gpu training by SunMarc in https://github.com/huggingface/accelerate/pull/2714
* allow gather_for_metrics to be more flexible by SunMarc in https://github.com/huggingface/accelerate/pull/2710
* Add Upcasting for FSDP in Mixed Precision. Add Concept Guide for FSPD and DeepSpeed. by fabianlim in https://github.com/huggingface/accelerate/pull/2674
* Segment out a deepspeed docker image by muellerzr in https://github.com/huggingface/accelerate/pull/2707
* Fixup `free_memory` to deal with garbage collection by muellerzr in https://github.com/huggingface/accelerate/pull/2716
* fix sampler serialization by SunMarc in https://github.com/huggingface/accelerate/pull/2723
* Fix sampler failing test by SunMarc in https://github.com/huggingface/accelerate/pull/2728
* Docs: Fix build main documentation by SunMarc in https://github.com/huggingface/accelerate/pull/2729
* Fix Documentation in FSDP and DeepSpeed Concept Guide by fabianlim in https://github.com/huggingface/accelerate/pull/2725
* Fix deepspeed offload device type by yhna940 in https://github.com/huggingface/accelerate/pull/2717
* FEAT: Add LOMO optimizer by younesbelkada in https://github.com/huggingface/accelerate/pull/2695
* Fix tests on main by muellerzr in https://github.com/huggingface/accelerate/pull/2739
New Contributors
* brentyi made their first contribution in https://github.com/huggingface/accelerate/pull/2650
* regisss made their first contribution in https://github.com/huggingface/accelerate/pull/2649
* yhna940 made their first contribution in https://github.com/huggingface/accelerate/pull/2646
* Wauplin made their first contribution in https://github.com/huggingface/accelerate/pull/2658
* ArthurinRUC made their first contribution in https://github.com/huggingface/accelerate/pull/2664
* jiqing-feng made their first contribution in https://github.com/huggingface/accelerate/pull/2652
* zhangsheng377 made their first contribution in https://github.com/huggingface/accelerate/pull/2667
* rb-synth made their first contribution in https://github.com/huggingface/accelerate/pull/2654
* drhead made their first contribution in https://github.com/huggingface/accelerate/pull/2685
**Full Changelog**: https://github.com/huggingface/accelerate/compare/v0.29.3...v0.30.0