Torchao

Latest version: v0.8.0

Safety actively analyzes 702253 Python packages for vulnerabilities to keep your Python projects secure.

Page 3 of 3

0.3.0

0.2.0

What's Changed

Highlights

Custom CPU/CUDA extension to ship CPU/CUDA binaries.

PyTorch core has recently shipped a new custom op registration mechanism with [torch.library](https://pytorch.org/docs/stable/library.html) with the benefit being that custom ops will compose with as many PyTorch subsystems as possible most notably NOT graph breaking with `torch.compile()`

We'd added some documentation for how you could register your own custom ops https://github.com/pytorch/ao/tree/main/torchao/csrc and if you learn better via example you can follow this PR https://github.com/pytorch/ao/pull/135 to add your own custom ops to `torchao`.

Most notably these instructions were leveraged by gau-nernst to integrate some new custom ops for `fp6` support https://github.com/pytorch/ao/pull/223

One key benefit of integrating your kernels in `torchao` directly is we thanks to our `manylinux` GPU support can ensure that CPU/CUDA kernels that you've added will work on as many devices and cuda versions as possible https://github.com/pytorch/ao/pull/176

A lot of prototype and community contributions

jeromeku was our community champion merging support for
1. [GaLore](https://arxiv.org/abs/2403.03507) our first pretraining kernel that allows you to finetune llama 7b on a single 4090 card with up to 70% speedups relative to eager PyTorch
2. [DoRA](https://arxiv.org/abs/2402.09353) which has been shown to yield superior fine-tuning accuracy results than QLoRA. This is an area where the community can help us benchmark more thoroughly https://github.com/pytorch/ao/tree/main/torchao/prototype/dora
3. Fused int4/fp16 quantized matmul which is particularly useful for compute bound kernels showing 4x speedups over tinygemm for larger batch sizes such as 512 https://github.com/pytorch/ao/tree/main/torchao/prototype/hqq

gau-nernst merged [fp6](https://arxiv.org/abs/2401.14112) support showing up to 8x speedups on an fp16 baseline for small batch size inference https://github.com/pytorch/ao/pull/223

NF4 support for upcoming [FSDP2](https://github.com/pytorch/pytorch/issues/114299)

weifengpy merged support for composing FSDP2 with NF4 which makes it easy to implement algorithms like QLoRA + FSDP without writing any CUDA or C++ code. This work also provides a blueprint for how to compose smaller dtypes with FSDP https://github.com/pytorch/ao/pull/150 most notably by implementing `torch.chunk()`. We hope the broader community uses this work to experiment more heavily at the intersection of distributed and quantization research and inspires many more studies such as the ones done by Answer.ai https://www.answer.ai/posts/2024-03-06-fsdp-qlora.html

BC breaking

Deprecations

New Features
* Match autoquant API with torch.compile (https://github.com/pytorch/ao/pull/109, https://github.com/pytorch/ao/pull/162, https://github.com/pytorch/ao/pull/175)
* [Prototype] 8da4w QAT (https://github.com/pytorch/ao/pull/138, https://github.com/pytorch/ao/pull/199, https://github.com/pytorch/ao/pull/198, https://github.com/pytorch/ao/pull/211, https://github.com/pytorch/ao/pull/154, https://github.com/pytorch/ao/pull/157, https://github.com/pytorch/ao/pull/229)
* [Prototype] GaLore (https://github.com/pytorch/ao/pull/95)
* [Prototype] DoRA (https://github.com/pytorch/ao/pull/216)
* [Prototype] HQQ (https://github.com/pytorch/ao/pull/153, https://github.com/pytorch/ao/pull/185)
* [Prototype] 2:4 sparse + int8 sparse subclass (https://github.com/pytorch/ao/pull/36)
* [Prototype] Unified quantization primitives (https://github.com/pytorch/ao/pull/159, https://github.com/pytorch/ao/pull/201, https://github.com/pytorch/ao/pull/193, https://github.com/pytorch/ao/pull/220, https://github.com/pytorch/ao/pull/227, https://github.com/pytorch/ao/pull/173, https://github.com/pytorch/ao/pull/210)
* [Prototype] Pruning primitives (https://github.com/pytorch/ao/pull/148, https://github.com/pytorch/ao/pull/194)
* [Prototype] AffineQuantizedTensor subclass (https://github.com/pytorch/ao/pull/214, https://github.com/pytorch/ao/pull/230, https://github.com/pytorch/ao/pull/243, https://github.com/pytorch/ao/pull/247, https://github.com/pytorch/ao/pull/251)
* [Prototype] Add `Int4WeightOnlyQuantizer` (https://github.com/pytorch/ao/pull/119)
* Custom CUDA extensions (https://github.com/pytorch/ao/pull/135, https://github.com/pytorch/ao/pull/186, https://github.com/pytorch/ao/pull/232)
* [Prototype] Add FP6 Linear (https://github.com/pytorch/ao/pull/223)

Improvements
* [FSDP2](https://github.com/pytorch/pytorch/issues/114299) support for NF4Tensor (https://github.com/pytorch/ao/pull/118, https://github.com/pytorch/ao/pull/150, https://github.com/pytorch/ao/pull/207)
* Add save/load of int8 weight only quantized model (https://github.com/pytorch/ao/pull/122)
* Add int_scaled_mm on CPU (https://github.com/pytorch/ao/pull/121)
* Add cpu and gpu in int4wo and int4wo-gptq quantizer (https://github.com/pytorch/ao/pull/131)
* Add torch.export support to int8_dq, int8_wo, int4_wo subclasses (https://github.com/pytorch/ao/pull/146, https://github.com/pytorch/ao/pull/226, https://github.com/pytorch/ao/pull/213)
* Remove `is_gpt_fast` specialization from GTPQ (https://github.com/pytorch/ao/pull/172)
* Common benchmark and profile utils (https://github.com/pytorch/ao/pull/238)

Bug fixes
* Fix padding in GPTQ (https://github.com/pytorch/ao/pull/119, https://github.com/pytorch/ao/pull/120)
* Fix `Int8DynActInt4WeightLinear` module swap (https://github.com/pytorch/ao/pull/151)
* Fix `NF4Tensor.to` to use device kwarg (https://github.com/pytorch/ao/pull/158)
* Fix `quantize_activation_per_token_absmax` perf regression (https://github.com/pytorch/ao/pull/253)

Performance
* Chunk NF4Tensor construction to reduce memory spike (https://github.com/pytorch/ao/pull/196)
* Fix intmm benchmark script (https://github.com/pytorch/ao/pull/141)

Docs
* Update READMEs (https://github.com/pytorch/ao/pull/140, https://github.com/pytorch/ao/pull/142, https://github.com/pytorch/ao/pull/169, https://github.com/pytorch/ao/pull/155, https://github.com/pytorch/ao/pull/179, https://github.com/pytorch/ao/pull/187, https://github.com/pytorch/ao/pull/188, https://github.com/pytorch/ao/pull/200, https://github.com/pytorch/ao/pull/217, https://github.com/pytorch/ao/pull/245)
* Add https://pytorch.org/ao (https://github.com/pytorch/ao/pull/136, https://github.com/pytorch/ao/pull/145, https://github.com/pytorch/ao/pull/163, https://github.com/pytorch/ao/pull/164, https://github.com/pytorch/ao/pull/165, https://github.com/pytorch/ao/pull/168, https://github.com/pytorch/ao/pull/177, https://github.com/pytorch/ao/pull/195, https://github.com/pytorch/ao/pull/224)

CI
* Add A10G support in CI (https://github.com/pytorch/ao/pull/176)
* General CI improvements (https://github.com/pytorch/ao/pull/161, https://github.com/pytorch/ao/pull/171, https://github.com/pytorch/ao/pull/178, https://github.com/pytorch/ao/pull/180, https://github.com/pytorch/ao/pull/183, https://github.com/pytorch/ao/pull/107, https://github.com/pytorch/ao/pull/215, https://github.com/pytorch/ao/pull/244, https://github.com/pytorch/ao/pull/257, https://github.com/pytorch/ao/pull/235, https://github.com/pytorch/ao/pull/242)
* Add expecttest to requirements.txt (https://github.com/pytorch/ao/pull/225)
* Push button binary support (https://github.com/pytorch/ao/pull/241, https://github.com/pytorch/ao/pull/240, https://github.com/pytorch/ao/pull/250)

Not user facing

Security

Untopiced
* Version bumps (https://github.com/pytorch/ao/pull/125, https://github.com/pytorch/ao/pull/234)
* Don't import _C in fbcode (https://github.com/pytorch/ao/pull/218)

New Contributors
* Xia-Weiwen made their first contribution in https://github.com/pytorch/ao/pull/121
* jeromeku made their first contribution in https://github.com/pytorch/ao/pull/95
* weifengpy made their first contribution in https://github.com/pytorch/ao/pull/118
* aakashapoorv made their first contribution in https://github.com/pytorch/ao/pull/179
* UsingtcNower made their first contribution in https://github.com/pytorch/ao/pull/194
* Jokeren made their first contribution in https://github.com/pytorch/ao/pull/217
* gau-nernst made their first contribution in https://github.com/pytorch/ao/pull/223
* janeyx99 made their first contribution in https://github.com/pytorch/ao/pull/245
* huydhn made their first contribution in https://github.com/pytorch/ao/pull/250
* lancerts made their first contribution in https://github.com/pytorch/ao/pull/238

**Full Changelog**: https://github.com/pytorch/ao/compare/v0.2.0...v0.2.1

We were able to close about [half of tasks for 0.2.0](https://github.com/pytorch/ao/issues/132), which will now spill over into upcoming releases. We will post a list for 0.3.0 next, which we aim to release at the end of May 2024. We want to follow a monthly release cadence until further notice.

0.1

Highlights
We’re excited to announce the release of TorchAO v0.1.0! TorchAO is a repository to host architecture optimization techniques such as quantization and sparsity and performance kernels on different backends such as CUDA and CPU. In this release, we added support for a few quantization techniques like int4 weight only GPTQ quantization, added nf4 dtype support for QLoRA and sparsity features like WandaSparsifier, we also added autotuner that can tune triton integer matrix multiplication kernels on cuda.

Note: TorchAO is currently in a pre-release state and under extensive development. The public APIs should not be considered stable. But we welcome you to try out our APIs and offerings and provide any feedback on your experience.

torchao 0.1.0 will be compatible with PyTorch 2.2.2 and 2.3.0, ExecuTorch 0.2.0 and TorchTune 0.1.0.

New Features
Quantization
* Added tensor subclass based quantization APIs: `change_linear_weights_to_int8_dqtensors`, `change_linear_weights_to_int8_woqtensors` and `change_linear_weights_to_int4_woqtensors` (1)
* Added module based quantization APIs for int8 dynamic and weight only quantization `apply_weight_only_int8_quant` and `apply_dynamic_quant` (1)
* Added module swap version of int4 weight only quantization `Int4WeightOnlyQuantizer` and `Int4WeightOnlyGPTQQuantizer` used in TorchTune (119, 116)
* Added int8 dynamic activation and int4 weight quantization `Int8DynActInt4WeightQuantizer` and `Int8DynActInt4WeightGPTQQuantizer`, used in ExecuTorch (74) (available after torch 2.3.0 and later)
Sparsity
* Added `WandaSparsifier` that prunes both weights and activations (22)
Kernels
* Added `autotuner` for int mm Triton kernels (41)
dtypes
* `nf4` tensor subclass and `nf4` linear (37, 40, 62)
* Added `uint4` dtype tensor subclass (13)

Improvements
* Setup github workflow for regression testing (50)
* Setup github workflow for `torchao-nightly` release (54)

Documentation
* Added tutorials for quantizing vision transformer model (60)
* Added tutorials for how to add an op for `nf4` tensor (54)

Notes
* we are still debugging the accuracy problem for `Int8DynActInt4WeightGPTQQuantizer`
* Save and load does not work well for tensor subclass based APIs yet
* We will consolidate tensor subclass and module swap based quantization APIs later
* `uint4` tensor subclass is going to be merged into pytorch core in the future
* Quantization ops in `quant_primitives.py` will be deduplicated with similar quantize/dequantize ops in PyTorch later

Page 3 of 3

Releases

Has known vulnerabilities

Torchao

Page 3 of 3

0.3.0

0.2.0

0.1

Page 3 of 3

Links

Releases