Bitsandbytes

Latest version: v0.44.1

Safety actively analyzes 681844 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 6

0.43.3

Improvements:

- FSDP: Enable loading prequantized weights with bf16/fp16/fp32 quant_storage
- Background: This update, linked to [Transformer PR 32276](https://github.com/huggingface/transformers/pull/32276), allows loading prequantized weights with alternative storage formats. Metadata is tracked similarly to `Params4bit.__new__` post PR #970. It supports models exported with non-default `quant_storage`, such as [this NF4 model with BF16 storage](https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-BNB-NF4-BF16).
- Special thanks to winglian and matthewdouglas for enabling FSDP+QLoRA finetuning of Llama 3.1 405B on a single 8xH100 or 8xA100 node with as little as 256GB system RAM.

0.43.2

This release is quite significant as the QLoRA bug fix big implications for higher `seqlen` and batch sizes.

For each sequence (i.e. batch size increase of one) we expect memory savings of:
- 405B: 39GB for `seqlen=1024`, and 4888GB for `seqlen=128,00`
- 70B: 10.1GB for `seqlen=1024` and 1258GB for `seqlen=128,00`

This was due to activations being unnecessary for frozen parameters, yet the memory for them was still erroneously allocated due to the now fixed bug.

Improvements:

- docs: FSDP+QLoRA and CPU install guide (1211 1227, thanks stevhliu)
- Add CUDA 12.5 and update 12.4 builds (1284)

Bug Fixes

- 4bit getstate and 8bit deepcopy (1230 1231, thanks BenjaminBossan)
- missing optimizers in `str2optimizer32bit` (1222, thanks EtienneDosSantos)
- CUDA 12.5 build issue (1273, thanks HennerM)
- fix for min_8bit_size functionality in Optimizer base classes (1286, thanks Edenzzzz)
- QLoRA mem bug (1270, thanks Ther-nullptr)
- tests for cpu only platforms (1259, thanks galqiwi)
- restoration of quant_storage for CPU offloading (1279)
- optim update error with non-contiguous grads/params (deepspeed) (1187)

0.43.1

Improvements:

- Improved the serialization format for 8-bit weights; this change is fully backwards compatible. (1164, thanks to younesbelkada for the contributions and akx for the review).
- Added CUDA 12.4 support to the Linux x86-64 build workflow, expanding the library's compatibility with the latest CUDA versions. (1171, kudos to matthewdouglas for this addition).
- Docs enhancement: Improved the instructions for installing the library from source. (1149, special thanks to stevhliu for the enhancements).

Bug Fixes

- Fix 4bit quantization with blocksize = 4096, where an illegal memory access was encountered. (1160, thanks matthewdouglas for fixing and YLGH for reporting)

Internal Improvements:

- Tests: improve memory usage (1147, thanks matthewdouglas)
- Add CUDA 12.4 to docs/install helper (1136, thanks matthewdouglas)
- Minor type/doc fixes (1128, thanks akx)
- Reformat Python code with Ruff (1081, thanks akx)
- Rework of CUDA/native-library setup and diagnostics (1041, thanks akx)

0.43.0

Improvements and New Features:

- QLoRA + FSDP official support is now live! https://github.com/TimDettmers/bitsandbytes/pull/970 by warner-benjamin and team - with FSDP you can train very large models (70b scale) on multiple 24GB consumer-type GPUs. See https://www.answer.ai/posts/2024-03-06-fsdp-qlora.html for more details.
- Introduced improvements to the CI process for enhanced performance and efficiency during builds, specifically enabling more effective cross-compilation on Linux platforms. This was accomplished by deprecating Make and migrating to Cmake, as well as implementing new corresponding workflows. Huge thanks go to wkpark, rickardp, matthewdouglas and younesbelkada; 1055, 1050, 1111.
- Windows should be officially supported in bitsandbytes if you install the library from source. See: https://huggingface.co/docs/bitsandbytes/main/en/index for more details
- Updated installation instructions to provide more comprehensive guidance for users. This includes clearer explanations and additional tips for various setup scenarios, making the library more accessible to a broader audience (rickardp, 1047).
- Enhanced the library's compatibility and setup process, including fixes for CPU-only installations and improvements in CUDA setup error messaging. This effort aims to streamline the installation process and improve user experience across different platforms and setups (wkpark, akx, 1038, 996, 1012).
- Setup a new documentation at https://huggingface.co/docs/bitsandbytes/main with extensive new sections and content to help users better understand and utilize the library. Especially notable are the new API docs. (big thanks to stevhliu and mishig25 from HuggingFace #1012). The API docs have been also addressed in 1075.

Bug Fixes:

- Addressed a race condition in kEstimateQuantiles, enhancing the reliability of quantile estimation in concurrent environments (pnunna93, 1061).
- Fixed various minor issues, including typos in code comments and documentation, to improve code clarity and prevent potential confusion (Brian Vaughan, 1063).

Backwards Compatibility

- After upgrading from `v0.42` to `v0.43`, when using 4bit quantization, models may generate slightly different outputs (approximately up to the 2nd decimal place) due to a fix in the code. For anyone interested in the details, [see this comment](https://github.com/TimDettmers/bitsandbytes/discussions/1094#discussioncomment-8984069).

Internal and Build System Enhancements:

- Implemented several enhancements to the internal and build systems, including adjustments to the CI workflows, portability improvements, and build artifact management. These changes contribute to a more robust and flexible development process, ensuring the library's ongoing quality and maintainability (rickardp, akx, wkpark, matthewdouglas; 949, 1053, 1045, 1037).

Contributors:

This release is made possible thanks to the many active contributors that submitted PRs and many others who contributed to discussions, reviews, and testing. Your efforts greatly enhance the library's quality and user experience. It's truly inspiring to work with such a dedicated and competent group of volunteers and professionals!

We give a special thanks to TimDettmers for managing to find a little bit of time for valuable consultations on critical topics, despite preparing for and touring the states applying for professor positions. We wish him the utmost success!

We also extend our gratitude to the broader community for your continued support, feedback, and engagement, which play a crucial role in driving the library's development forward.

0.42.0

Features:

- 4-bit serialization now supported. This enables 4-bit load/store. Thank you poedator 753
- the bitsandbytes library now has a version attribute: `bitsandbytes.__version__` rasbt 710

Bug fixes:

- Fixed bugs in dynamic exponent data type creation. Thank you RossM, KohakuBlueleaf, ArrowM 659 227 262 152
- Fixed an issue where 4-bit serialization would fail for layers without double quantization 868. Thank you, poedator
- Fixed an issue where calling .to() or .cuda() on a 4-bit layer twice would result in an error 867. Thank you, jph00
- Fixed a bug where a missing access permission in a path searched for CUDA would lead to an error osma 677
- Fixed a bug where the GOOGLE_VM_CONFIG_LOCK_FILE variable could cause errors in colab environments akrentsel xaptronic 715 883 622
- Fixed a bug where kgetColRowStats (LLM.int8()) would fail for certain dimensions LucQueen 905
- Fixed a bug where the adjusted regular Embedding layer was not available via bnb.nn.Embedding neel04 563
- Fixed added missing scipy requirement dulalbert 525

0.41.3

Bug fixes:

- Fixed an issue where 4-bit serialization would fail for layers without double quantization 868. Thank you, poedator
- Fixed an issue where calling .to() or .cuda() on a 4-bit layer twice would result in an error 867. Thank you, jph00

Page 1 of 6

Releases

Has known vulnerabilities

Bitsandbytes

Page 1 of 6

0.43.3

0.43.2

0.43.1

0.43.0

0.42.0

0.41.3

Page 1 of 6

Links

Releases