Accelerate

Latest version: v1.1.1

Safety actively analyzes 678547 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 16

1.1.0

Internals:
* Allow for a `data_seed` argument in https://github.com/huggingface/accelerate/pull/3150
* Trigger `weights_only=True` by default for all compatible objects when checkpointing and saving with `torch.save` in https://github.com/huggingface/accelerate/pull/3036
* Handle negative values for `dim` input in `pad_across_processes` in https://github.com/huggingface/accelerate/pull/3114
* Enable cpu bnb distributed lora finetune in https://github.com/huggingface/accelerate/pull/3159

DeepSpeed
* Support torch dynamo for deepspeed>=0.14.4 in https://github.com/huggingface/accelerate/pull/3069

Megatron
* update Megatron-LM plugin code to version 0.8.0 or higher in https://github.com/huggingface/accelerate/pull/3174

Big Model Inference
* New `has_offloaded_params` utility added in https://github.com/huggingface/accelerate/pull/3188

Examples
* Florence2 distributed inference example in https://github.com/huggingface/accelerate/pull/3123

Full Changelog
* Handle negative values for `dim` input in `pad_across_processes` by mariusarvinte in https://github.com/huggingface/accelerate/pull/3114
* Fixup DS issue with weakref by muellerzr in https://github.com/huggingface/accelerate/pull/3143
* Refactor scaler to util by muellerzr in https://github.com/huggingface/accelerate/pull/3142
* DS fix, continued by muellerzr in https://github.com/huggingface/accelerate/pull/3145
* Florence2 distributed inference example by hlky in https://github.com/huggingface/accelerate/pull/3123
* POC: Allow for a `data_seed` by muellerzr in https://github.com/huggingface/accelerate/pull/3150
* Adding multi gpu speech generation by dame-cell in https://github.com/huggingface/accelerate/pull/3149
* support torch dynamo for deepspeed>=0.14.4 by oraluben in https://github.com/huggingface/accelerate/pull/3069
* Fixup Zero3 + `save_model` by muellerzr in https://github.com/huggingface/accelerate/pull/3146
* Trigger `weights_only=True` by default for all compatible objects by muellerzr in https://github.com/huggingface/accelerate/pull/3036
* Remove broken dynamo test by oraluben in https://github.com/huggingface/accelerate/pull/3155
* fix version check bug in `get_xpu_available_memory` by faaany in https://github.com/huggingface/accelerate/pull/3165
* enable cpu bnb distributed lora finetune by jiqing-feng in https://github.com/huggingface/accelerate/pull/3159
* [Utils] `has_offloaded_params` by kylesayrs in https://github.com/huggingface/accelerate/pull/3188
* fix bnb by eljandoubi in https://github.com/huggingface/accelerate/pull/3186
* [docs] update neptune API by faaany in https://github.com/huggingface/accelerate/pull/3181
* docs: fix a wrong word in comment in src/accelerate/accelerate.py:1255 by Rebornix-zero in https://github.com/huggingface/accelerate/pull/3183
* [docs] use nn.module instead of tensor as model by faaany in https://github.com/huggingface/accelerate/pull/3157
* Fix typo by kylesayrs in https://github.com/huggingface/accelerate/pull/3191
* MLU devices : Checks if mlu is available via an cndev-based check which won't trigger the drivers and leave mlu by huismiling in https://github.com/huggingface/accelerate/pull/3187
* update Megatron-LM plugin code to version 0.8.0 or higher. by eljandoubi in https://github.com/huggingface/accelerate/pull/3174
* 🚨 🚨 🚨 Goodbye Python 3.8! 🚨 🚨 🚨 by muellerzr in https://github.com/huggingface/accelerate/pull/3194
* Update transformers.deepspeed references from transformers 4.46.0 release by loadams in https://github.com/huggingface/accelerate/pull/3196
* eliminate dead code by statelesshz in https://github.com/huggingface/accelerate/pull/3198
* take `torch.nn.Module` model into account when moving to device by faaany in https://github.com/huggingface/accelerate/pull/3167
* [docs] add xpu part and fix bug in `torchrun` by faaany in https://github.com/huggingface/accelerate/pull/3166
* Models With Tied Weights Need Re-Tieing After FSDP Param Init by fabianlim in https://github.com/huggingface/accelerate/pull/3154
* add the missing xpu for local sgd by faaany in https://github.com/huggingface/accelerate/pull/3163
* typo fix in big_modeling.py by a-r-r-o-w in https://github.com/huggingface/accelerate/pull/3207
* [Utils] `align_module_device` by kylesayrs in https://github.com/huggingface/accelerate/pull/3204

New Contributors
* mariusarvinte made their first contribution in https://github.com/huggingface/accelerate/pull/3114
* hlky made their first contribution in https://github.com/huggingface/accelerate/pull/3123
* dame-cell made their first contribution in https://github.com/huggingface/accelerate/pull/3149
* kylesayrs made their first contribution in https://github.com/huggingface/accelerate/pull/3188
* eljandoubi made their first contribution in https://github.com/huggingface/accelerate/pull/3186
* Rebornix-zero made their first contribution in https://github.com/huggingface/accelerate/pull/3183
* loadams made their first contribution in https://github.com/huggingface/accelerate/pull/3196

**Full Changelog**: https://github.com/huggingface/accelerate/compare/v1.0.1...v1.1.0

1.0.1

Bugfixes

* Fixes an issue where the `auto` values were no longer being parsed when using [deepspeed](https://github.com/huggingface/accelerate/pull/3143)
* Fixes a broken test in the deepspeed tests related to the [auto values](https://github.com/huggingface/accelerate/pull/3145)

**Full Changelog**: https://github.com/huggingface/accelerate/compare/v1.0.0...v1.0.1

1.0

With `accelerate` 1.0, we are officially stating that the core parts of the API are now "stable" and ready for the future of what the world of distributed training and PyTorch has to handle. With these release notes, we will focus first on the major breaking changes to get your code fixed, followed by what is new specifically between 0.34.0 and 1.0.

To read more, check out our official blog [here](https://huggingface.co/blog/accelerate-v1)

Migration assistance

* Passing in `dispatch_batches`, `split_batches`, `even_batches`, and `use_seedable_sampler` to the `Accelerator()` should now be handled by creating an `accelerate.utils.DataLoaderConfiguration()` and passing this to the `Accelerator()` instead (`Accelerator(dataloader_config=DataLoaderConfiguration(...))`)
* `Accelerator().use_fp16` and `AcceleratorState().use_fp16` have been removed; this should be replaced by checking `accelerator.mixed_precision == "fp16"`
* `Accelerator().autocast()` no longer accepts a `cache_enabled` argument. Instead, an `AutocastKwargs()` instance should be used which handles this flag (among others) passing it to the `Accelerator` (`Accelerator(kwargs_handlers=[AutocastKwargs(cache_enabled=True)])`)
* `accelerate.utils.is_tpu_available` should be replaced with `accelerate.utils.is_torch_xla_available`
* `accelerate.utils.modeling.shard_checkpoint` should be replaced with `split_torch_state_dict_into_shards` from the `huggingface_hub` library
* `accelerate.tqdm.tqdm()` no longer accepts `True`/`False` as the first argument, and instead, `main_process_only` should be passed in as a named argument

Multiple Model DeepSpeed Support

After long request, we finally have multiple model DeepSpeed support in Accelerate! (though it is quite early still). Read the full tutorial [here](https://huggingface.co/docs/accelerate/v1.0.0/en/usage_guides/deepspeed_multiple_model#using-multiple-models-with-deepspeed), however essentially:

When using multiple models, a DeepSpeed plugin should be created for each model (and as a result, a separate config). a few examples are below:

Knowledge distillation

(Where we train only one model, zero3, and another is used for inference, zero2)

python
from accelerate import Accelerator
from accelerate.utils import DeepSpeedPlugin

zero2_plugin = DeepSpeedPlugin(hf_ds_config="zero2_config.json")
zero3_plugin = DeepSpeedPlugin(hf_ds_config="zero3_config.json")

deepspeed_plugins = {"student": zero2_plugin, "teacher": zero3_plugin}


accelerator = Accelerator(deepspeed_plugins=deepspeed_plugins)


To then select which plugin to be used at a certain time (aka when calling `prepare`), we call `accelerator.state.select_deepspeed_plugin("name"), where the first plugin is active by default:

python
accelerator.state.select_deepspeed_plugin("student")
student_model, optimizer, scheduler = ...
student_model, optimizer, scheduler, train_dataloader = accelerator.prepare(student_model, optimizer, scheduler, train_dataloader)

accelerator.state.select_deepspeed_plugin("teacher") This will automatically enable zero init
teacher_model = AutoModel.from_pretrained(...)
teacher_model = accelerator.prepare(teacher_model)


Multiple disjoint models

For disjoint models, separate accelerators should be used for each model, and their own `.backward()` should be called later:

python
for batch in dl:
outputs1 = first_model(**batch)
first_accelerator.backward(outputs1.loss)
first_optimizer.step()
first_scheduler.step()
first_optimizer.zero_grad()

outputs2 = model2(**batch)
second_accelerator.backward(outputs2.loss)
second_optimizer.step()
second_scheduler.step()
second_optimizer.zero_grad()


FP8

We've enabled MS-AMP support up to FSDP. At this time we are not going forward with implementing FSDP support with MS-AMP, due to design issues between both libraries that don't make them inter-op easily.

FSDP
* Fixed FSDP auto_wrap using characters instead of full str for layers
* Re-enable setting state dict type manually

Big Modeling
* Removed cpu restriction for bnb training

What's Changed
* Fix FSDP auto_wrap using characters instead of full str for layers by muellerzr in https://github.com/huggingface/accelerate/pull/3075
* Allow DataLoaderAdapter subclasses to be pickled by implementing `__reduce__` by byi8220 in https://github.com/huggingface/accelerate/pull/3074
* Fix three typos in src/accelerate/data_loader.py by xiabingquan in https://github.com/huggingface/accelerate/pull/3082
* Re-enable setting state dict type by muellerzr in https://github.com/huggingface/accelerate/pull/3084
* Support sequential cpu offloading with torchao quantized tensors by a-r-r-o-w in https://github.com/huggingface/accelerate/pull/3085
* fix bug in `_get_named_modules` by faaany in https://github.com/huggingface/accelerate/pull/3052
* use the correct available memory API for XPU by faaany in https://github.com/huggingface/accelerate/pull/3076
* fix `skip_keys` usage in forward hooks by 152334H in https://github.com/huggingface/accelerate/pull/3088
* Update README.md to include distributed image generation gist by sayakpaul in https://github.com/huggingface/accelerate/pull/3077
* MAINT: Upgrade ruff to v0.6.4 by BenjaminBossan in https://github.com/huggingface/accelerate/pull/3095
* Revert "Enable Unwrapping for Model State Dicts (FSDP)" by SunMarc in https://github.com/huggingface/accelerate/pull/3096
* MS-AMP support (w/o FSDP) by muellerzr in https://github.com/huggingface/accelerate/pull/3093
* [docs] DataLoaderConfiguration docstring by stevhliu in https://github.com/huggingface/accelerate/pull/3103
* MAINT: Permission for GH token in stale.yml by BenjaminBossan in https://github.com/huggingface/accelerate/pull/3102
* [docs] Doc sprint by stevhliu in https://github.com/huggingface/accelerate/pull/3099
* Update image ref for docs by muellerzr in https://github.com/huggingface/accelerate/pull/3105
* No more t5 by muellerzr in https://github.com/huggingface/accelerate/pull/3107
* [docs] More docstrings by stevhliu in https://github.com/huggingface/accelerate/pull/3108
* 🚨🚨🚨 The Great Deprecation 🚨🚨🚨 by muellerzr in https://github.com/huggingface/accelerate/pull/3098
* POC: multiple model/configuration DeepSpeed support by muellerzr in https://github.com/huggingface/accelerate/pull/3097
* Fixup test_sync w/ deprecated stuff by muellerzr in https://github.com/huggingface/accelerate/pull/3109
* Switch to XLA instead of TPU by SunMarc in https://github.com/huggingface/accelerate/pull/3118
* [tests] skip pippy tests for XPU by faaany in https://github.com/huggingface/accelerate/pull/3119
* Fixup multiple model DS tests by muellerzr in https://github.com/huggingface/accelerate/pull/3131
* remove cpu restriction for bnb training by jiqing-feng in https://github.com/huggingface/accelerate/pull/3062
* fix deprecated `torch.cuda.amp.GradScaler` FutureWarning for pytorch 2.4+ by Mon-ius in https://github.com/huggingface/accelerate/pull/3132
* 🐛 [HotFix] Handle Profiler Activities Based on PyTorch Version by yhna940 in https://github.com/huggingface/accelerate/pull/3136
* only move model to device when model is in cpu and target device is xpu by faaany in https://github.com/huggingface/accelerate/pull/3133
* fix tip brackets typo by davanstrien in https://github.com/huggingface/accelerate/pull/3129
* typo of "scalar" instead of "scaler" by tonyzhaozh in https://github.com/huggingface/accelerate/pull/3116
* MNT Permission for PRs for GH token in stale.yml by BenjaminBossan in https://github.com/huggingface/accelerate/pull/3112

New Contributors
* xiabingquan made their first contribution in https://github.com/huggingface/accelerate/pull/3082
* a-r-r-o-w made their first contribution in https://github.com/huggingface/accelerate/pull/3085
* 152334H made their first contribution in https://github.com/huggingface/accelerate/pull/3088
* sayakpaul made their first contribution in https://github.com/huggingface/accelerate/pull/3077
* Mon-ius made their first contribution in https://github.com/huggingface/accelerate/pull/3132
* davanstrien made their first contribution in https://github.com/huggingface/accelerate/pull/3129
* tonyzhaozh made their first contribution in https://github.com/huggingface/accelerate/pull/3116

**Full Changelog**: https://github.com/huggingface/accelerate/compare/v0.34.2...v1.0.0

1.0.0

0.34.1

Bug fixes
* Fixes an issue where processed `DataLoaders` could no longer be pickled in 3074 thanks to byi8220
* Fixes an issue when using FSDP where `default_transformers_cls_names_to_wrap` would separate `_no_split_modules` by characters instead of keeping it as a list of layer names in 3075

**Full Changelog**: https://github.com/huggingface/accelerate/compare/v0.34.0...v0.34.1

0.34.0

Dependency Changes
- **Updated Safetensors Requirement:** The library now requires `safetensors` version 0.4.3.
- **Added support for Numpy 2.0:** The library now fully supports `numpy` 2.0.0

Core

New Script Behavior Changes
- **Process Group Management:** PyTorch now requires users to destroy process groups after training. The `accelerate` library will handle this automatically with `accelerator.end_training()`, or you can do it manually using `PartialState().destroy_process_group()`.
- **MLU Device Support:** Added support for saving and loading RNG states on MLU devices by huismiling
- **NPU Support:** Corrected backend and distributed settings when using `transfer_to_npu`, ensuring better performance and compatibility.

DataLoader Enhancements
- **Stateful DataDataLoader:** We are excited to announce that early support has been added for the `StatefulDataLoader` from `torchdata`, allowing better handling of data loading states. Enable by passing `use_stateful_dataloader=True` to the `DataLoaderConfiguration`, and when calling `load_state()` the `DataLoader` will automatically be resumed from its last step, no more having to iterate through passed batches.
- **Decoupled Data Loader Preparation:** The `prepare_data_loader()` function is now **independent** of the `Accelerator`, giving you more flexibility towards which API levels you would like to use.
- **XLA Compatibility:** Added support for skipping initial batches when using XLA.
- **Improved State Management:** Bug fixes and enhancements for saving/loading `DataLoader` states, ensuring smoother training sessions.
- **Epoch Setting:** Introduced the `set_epoch` function for `MpDeviceLoaderWrapper`.

FP8 Training Improvements
- **Enhanced FP8 Training:** Fully Sharded Data Parallelism (FSDP) and DeepSpeed support now work seamlessly with `TransformerEngine` FP8 training, including better defaults for the quantized FP8 weights.
- **Integration baseline**: We've added a new suite of examples and benchmarks to ensure that our `TransformerEngine` integration works exactly as intended. These scripts run one half using 🤗 Accelerate's integration, the other with raw `TransformersEngine`, providing users with a nice example of what we do under the hood with accelerate, and a good sanity check to make sure nothing breaks down over time. Find them [here](https://github.com/huggingface/accelerate/tree/main/benchmarks/fp8)
- **Import Fixes:** Resolved issues with import checks for the Transformers Engine that has downstream issues.
- **FP8 Docker Images:** We've added new docker images for `TransformerEngine` and `accelerate` as well. Use `docker pull huggingface/accelerategpu-fp8-transformerengine` to quickly get an environment going.


`torchpippy` no more, long live `torch.distributed.pipelining`
- With the latest PyTorch release, `torchpippy` is now fully integrated into torch core, and as a result we are **exclusively supporting the PyTorch implementation from now on**
- There are breaking examples and changes that comes from this shift. Namely:
- Tracing of inputs is done with a shape *each GPU will see*, rather than the size of the total batch. So for 2 GPUs, one should pass in an input of `[1, n, n]` rather than `[2, n, n]` as before.
- **We no longer support Encoder/Decoder models**. PyTorch tracing for `pipelining` no longer supports encoder/decoder models, so the `t5` example has been removed.
- **Computer vision model support currently does not work**: There are some tracing issues regarding resnet's we are actively looking into.
- **If either of these changes are too breaking, we recommend pinning your accelerate version**. If the encoder/decoder model support is **actively blocking your inference using pippy**, please open an issue and let us know. We can look towards adding in the old support for `torchpippy` potentially if needed.

Fully Sharded Data Parallelism (FSDP)
- **Environment Flexibility:** Environment variables are now **fully optional** for FSDP, simplifying configuration. You can now fully create a `FullyShardedDataParallelPlugin` yourself manually *with no need for environment patching*:
python
from accelerate import FullyShardedDataParallelPlugin
fsdp_plugin = FullyShardedDataParallelPlugin(...)

- **FSDP RAM efficient loading:** Added a utility to enable RAM-efficient model loading (by setting the proper environmental variable). **This is generally needed if not using `accelerate launch` and need to ensure the env variables are setup properly for model loading**:
python
from accelerate.utils import enable_fsdp_ram_efficient_loading, disable_fsdp_ram_efficient_loading
enable_fsdp_ram_efficient_loading()

- **Model State Dict Management:** Enhanced support for unwrapping model state dicts in FSDP, making it easier to manage distributed models.

New Examples
- **Configuration and Models:** Improved configuration handling and introduced a configuration zoo for easier experimentation. You can learn more [here](https://github.com/huggingface/accelerate/tree/main/examples/config_yaml_templates). This was largely inspired by the `axolotl` library, so very big kudos to their wonderful [work](https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples)
- **FSDP + SLURM Example:** Added a [minimal configuration example for running jobs with SLURM and using FSDP](https://github.com/huggingface/accelerate/blob/main/examples/slurm/submit_multinode_fsdp.sh)

Bug Fixes
* Fix bug of clip_grad_norm_ for xla fsdp by hanwen-sun in https://github.com/huggingface/accelerate/pull/2941
* Explicit check for `step` when loading the state by muellerzr in https://github.com/huggingface/accelerate/pull/2992
* Fix `find_tied_params` for models with shared layers by qubvel in https://github.com/huggingface/accelerate/pull/2986
* clear memory after offload by SunMarc in https://github.com/huggingface/accelerate/pull/2994
* fix default value for rank size in cpu threads_per_process assignment logic by rbrugaro in https://github.com/huggingface/accelerate/pull/3009
* Fix batch_sampler maybe None error by candlewill in https://github.com/huggingface/accelerate/pull/3025
* Do not import `transformer_engine` on import by oraluben in https://github.com/huggingface/accelerate/pull/3056
* Fix torchvision to be compatible with torch version in CI by SunMarc in https://github.com/huggingface/accelerate/pull/2982
* Fix gated test by muellerzr in https://github.com/huggingface/accelerate/pull/2993
* Fix typo on warning str: "on the meta device device" -> "on the meta device" by HeAndres in https://github.com/huggingface/accelerate/pull/2997
* Fix deepspeed tests by muellerzr in https://github.com/huggingface/accelerate/pull/3003
* Fix torch version check by muellerzr in https://github.com/huggingface/accelerate/pull/3024
* Fix fp8 benchmark on single GPU by muellerzr in https://github.com/huggingface/accelerate/pull/3032
* Fix typo in comment by zmoki688 in https://github.com/huggingface/accelerate/pull/3045
* Speed up tests by shaving off subprocess when not needed by muellerzr in https://github.com/huggingface/accelerate/pull/3042
* Remove `skip_first_batches` support for StatefulDataloader and fix all the tests by muellerzr in https://github.com/huggingface/accelerate/pull/3068

New Contributors
* byi8220 made their first contribution in https://github.com/huggingface/accelerate/pull/2957
* alex-jw-brooks made their first contribution in https://github.com/huggingface/accelerate/pull/2959
* XciD made their first contribution in https://github.com/huggingface/accelerate/pull/2981
* hanwen-sun made their first contribution in https://github.com/huggingface/accelerate/pull/2941
* HeAndres made their first contribution in https://github.com/huggingface/accelerate/pull/2997
* yitongh made their first contribution in https://github.com/huggingface/accelerate/pull/2966
* qubvel made their first contribution in https://github.com/huggingface/accelerate/pull/2986
* rbrugaro made their first contribution in https://github.com/huggingface/accelerate/pull/3009
* candlewill made their first contribution in https://github.com/huggingface/accelerate/pull/3025
* siddk made their first contribution in https://github.com/huggingface/accelerate/pull/3047
* oraluben made their first contribution in https://github.com/huggingface/accelerate/pull/3056
* tmm1 made their first contribution in https://github.com/huggingface/accelerate/pull/3055
* zmoki688 made their first contribution in https://github.com/huggingface/accelerate/pull/3045

Full Changelog:
* Require safetensors>=0.4.3 by byi8220 in https://github.com/huggingface/accelerate/pull/2957
* Fix torchvision to be compatible with torch version in CI by SunMarc in https://github.com/huggingface/accelerate/pull/2982
* Enable Unwrapping for Model State Dicts (FSDP) by alex-jw-brooks in https://github.com/huggingface/accelerate/pull/2959
* chore: Update runs-on configuration for CI workflows by XciD in https://github.com/huggingface/accelerate/pull/2981
* add MLU devices for rng state saving and loading. by huismiling in https://github.com/huggingface/accelerate/pull/2940
* remove .md to allow proper linking by nbroad1881 in https://github.com/huggingface/accelerate/pull/2977
* Fix bug of clip_grad_norm_ for xla fsdp by hanwen-sun in https://github.com/huggingface/accelerate/pull/2941
* Fix gated test by muellerzr in https://github.com/huggingface/accelerate/pull/2993
* Explicit check for `step` when loading the state by muellerzr in https://github.com/huggingface/accelerate/pull/2992
* Fix typo on warning str: "on the meta device device" -> "on the meta device" by HeAndres in https://github.com/huggingface/accelerate/pull/2997
* Support skip_first_batches for XLA by yitongh in https://github.com/huggingface/accelerate/pull/2966
* clear memory after offload by SunMarc in https://github.com/huggingface/accelerate/pull/2994
* Fix deepspeed tests by muellerzr in https://github.com/huggingface/accelerate/pull/3003
* Make env variables optional for FSDP by muellerzr in https://github.com/huggingface/accelerate/pull/2998
* Add small util to enable FSDP offloading quickly by muellerzr in https://github.com/huggingface/accelerate/pull/3006
* update version to 0.34.dev0 by SunMarc in https://github.com/huggingface/accelerate/pull/3007
* Fix `find_tied_params` for models with shared layers by qubvel in https://github.com/huggingface/accelerate/pull/2986
* Enable FSDP & Deepspeed + FP8 by muellerzr in https://github.com/huggingface/accelerate/pull/2983
* fix default value for rank size in cpu threads_per_process assignment logic by rbrugaro in https://github.com/huggingface/accelerate/pull/3009
* Wrong import check for TE by muellerzr in https://github.com/huggingface/accelerate/pull/3016
* destroy process group in `end_training` by SunMarc in https://github.com/huggingface/accelerate/pull/3012
* Tweak defaults for quantized-typed FP8 TE weights by muellerzr in https://github.com/huggingface/accelerate/pull/3018
* Set correct NPU backend and distributed_type when using transfer_to_npu by ArthurinRUC in https://github.com/huggingface/accelerate/pull/3021
* Fix torch version check by muellerzr in https://github.com/huggingface/accelerate/pull/3024
* Add end_training/destroy_pg to everything and unpin numpy by muellerzr in https://github.com/huggingface/accelerate/pull/3030
* Improve config handling and add a zoo by muellerzr in https://github.com/huggingface/accelerate/pull/3029
* Add early support for `torchdata.stateful_dataloader.StatefulDataLoader` within the `Accelerator` by byi8220 in https://github.com/huggingface/accelerate/pull/2895
* Fix fp8 benchmark on single GPU by muellerzr in https://github.com/huggingface/accelerate/pull/3032
* Fix batch_sampler maybe None error by candlewill in https://github.com/huggingface/accelerate/pull/3025
* Fixup dataloader state dict bugs + incorporate load/save_state API by muellerzr in https://github.com/huggingface/accelerate/pull/3034
* Decouple `prepare_data_loader()` from Accelerator by siddk in https://github.com/huggingface/accelerate/pull/3047
* Update CONTRIBUTING.md Setup Instructions by siddk in https://github.com/huggingface/accelerate/pull/3046
* Add a SLURM example with minimal config by muellerzr in https://github.com/huggingface/accelerate/pull/2950
* Add FP8 docker images by muellerzr in https://github.com/huggingface/accelerate/pull/3048
* Update torchpippy by muellerzr in https://github.com/huggingface/accelerate/pull/2938
* Do not import `transformer_engine` on import by oraluben in https://github.com/huggingface/accelerate/pull/3056
* use duck-typing to ensure underlying optimizer supports schedulefree hooks by tmm1 in https://github.com/huggingface/accelerate/pull/3055
* Fix typo in comment by zmoki688 in https://github.com/huggingface/accelerate/pull/3045
* add set_epoch for MpDeviceLoaderWrapper by hanwen-sun in https://github.com/huggingface/accelerate/pull/3053
* Speed up tests by shaving off subprocess when not needed by muellerzr in https://github.com/huggingface/accelerate/pull/3042
* Remove `skip_first_batches` support for StatefulDataloader and fix all the tests by muellerzr in https://github.com/huggingface/accelerate/pull/3068


Detailed Full Changelog:
* https://github.com/huggingface/accelerate/compare/v0.33.0...v0.34.0

Page 1 of 16

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.