Big model inference
Support has been added to run `device_map="auto"` on the MPS device. Big model inference also work with models loaded in 4 bits in Transformers.
* Add mps support to big inference modeling by SunMarc in 1545
* Adds fp4 support for model dispatching by younesbelkada in 1505
4-bit QLoRA Support
* 4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) by TimDettmers in 1458
Distributed Inference Utilities
This version introduces a new `Accelerator.split_between_processes` utility to help with performing distributed infernece with non-tensorized or non-dataloader workflows. Read more [here](https://huggingface.co/docs/accelerate/usage_guides/distributed_inference)
Introduce XPU support for Intel GPU
* Intel GPU support initialization by abhilash1910 in 1118
Add support for the new PyTorch XLA TPU runtime
* Accelerate now supports the latest TPU runtimes 1393, 1385
A new optimizer method: `LocalSGD`
* This is a new wrapper around SGD which enables efficient multi-GPU training in the case when no fast interconnect is possible by searchivarius in 1378
Papers with π€ Accelerate
* We now have an entire section of the docs dedicated to official paper implementations and citations using the framework 1399, see it live [here](https://hf.co/docs/accelerate/usage_guides/training_zoo#in-science)
Breaking changes
`logging_dir` has been fully deprecated, please use `project_dir` or a `Project_configuration`
What's new?
* use existing mlflow experiment if exists by Rusteam in 1403
* changes required for DS integration by pacman100 in 1406
* fix deepspeed failing tests by pacman100 in 1411
* Make mlflow logging dir optional by mattplo-decath in 1413
* Fix bug on ipex for diffusers by abhilash1910 in 1426
* Improve Slack Updater by muellerzr in 1433
* Let quality yell at the user if it's a version difference by muellerzr in 1438
* Ensure that it gets installed by muellerzr in 1439
* [`core`] Introducing `CustomDtype` enum for custom dtypes by younesbelkada in 1434
* Fix XPU by muellerzr in 1440
* Make sure torch compiled model can also be unwrapped by patrickvonplaten in 1437
* fixed: ZeroDivisionError: division by zero by sreio in 1436
* fix potential OOM when resuming with multi-GPU training by exhyy in 1444
* Fixes in infer_auto_device_map by sgugger in 1441
* Raise error when logging improperly by muellerzr in 1446
* Fix ci by muellerzr in 1447
* Distributed prompting/inference utility by muellerzr in 1410
* Add to by muellerzr in 1448
* split_between_processes by stevhliu in 1449
* [docs] Replace `state.rank` -> `process_index` by pcuenca in 1450
* Auto multigpu logic by muellerzr in 1452
* Update with cli instructions by muellerzr in 1453
* Adds `in_order` argument that defaults to False, to log in order. by JulesGM in 1262
* fix error for CPU DDP using trainer api. by sywangyi in 1455
* Refactor and simplify xpu device in state by abhilash1910 in 1456
* Document how to use commands with python module instead of argparse by muellerzr in 1457
* 4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) by TimDettmers in 1458
* Fix skip first batch being perminant by muellerzr in 1466
* update conversion of layers to retain original data type. by avisinghal6 in 1467
* Check for xpu specifically by muellerzr in 1472
* update `register_empty_buffer` to match torch args by NouamaneTazi in 1465
* Update gradient accumulation docs, and remove redundant example by iantbutler01 in 1461
* Imrpove sagemaker by muellerzr in 1470
* Split tensors as part of `split_between_processes` by muellerzr in 1477
* Move to device by muellerzr in 1478
* Fix gradient state bugs in multiple dataloader by Ethan-yt in 1483
* Add rdzv-backend by muellerzr in 1490
* Only use IPEX if available by muellerzr in 1495
* Update README.md by lyhue1991 in 1493
* Let gather_for_metrics always run by muellerzr in 1496
* Use empty like when we only need to create buffers by thomasw21 in 1497
* Allow key skipping in big model inference by sgugger in 1491
* fix crash when ipex is installed and torch has no xpu by sywangyi in 1502
* [`bnb`] Add fp4 support for dispatch by younesbelkada in 1505
* Fix 4bit model on multiple devices by SunMarc in 1506
* adjust overriding of model's forward function by prathikr in 1492
* Add assertion when call prepare with deepspeed config. by tensimiku in 1468
* NVME path support for deepspeed by abhilash1910 in 1484
* should set correct dtype to ipex optimize and use amp logic in native⦠by sywangyi in 1511
* Swap env vars for XPU and IPEX + CLI by muellerzr in 1513
* Fix a bug when parameters tied belong to the same module by sgugger in 1514
* Fixup deepspeed/cli tests by muellerzr in 1526
* Refactor mp into its own wrapper by muellerzr in 1527
* Check tied parameters by SunMarc in 1529
* Raise ValueError on iterable dataset if we've hit the end and attempting to go beyond it by muellerzr in 1531
* Officially support naive PP for quantized models + PEFT by younesbelkada in 1523
* remove ipexplugin, let ACCELERATE_USE_IPEX/ACCELERATE_USE_XPU control the ipex and xpu by sywangyi in 1503
* Prevent using extra VRAM for static device_map by LSerranoPEReN in 1536
* Update deepspeed.mdx by LiamSwayne in 1541
* Update performance.mdx by LiamSwayne in 1543
* Update deferring_execution.mdx by LiamSwayne in 1544
* Apply deprecations by muellerzr in 1537
* Add mps support to big inference modeling by SunMarc in 1545
* [documentation] grammar fixes in gradient_synchronization.mdx by LiamSwayne in 1547
* Eval mode by muellerzr in 1540
* Update migration.mdx by LiamSwayne in 1549
Significant community contributions
The following contributors have made significant changes to the library over the last release:
* will-cromar
* Support TPU v4 with new PyTorch/XLA TPU runtime (1393)
* Support TPU v2 and v3 on new PyTorch/XLA TPU runtime (1385)
* searchivarius
* Adding support for local SGD. (1378)
* abhilash1910
* Intel GPU support initialization (1118)
* Fix bug on ipex for diffusers (1426)
* Refactor and simplify xpu device in state (1456)
* NVME path support for deepspeed (1484)
* sywangyi
* fix error for CPU DDP using trainer api. (1455)
* fix crash when ipex is installed and torch has no xpu (1502)
* should set correct dtype to ipex optimize and use amp logic in native⦠(1511)
* remove ipexplugin, let ACCELERATE_USE_IPEX/ACCELERATE_USE_XPU control the ipex and xpu (1503)
* Ethan-yt
* Fix gradient state bugs in multiple dataloader (1483)