Neural-compressor

Latest version: v3.1.1

Safety actively analyzes 688323 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 7

3.1

- Highlights
- Features
- Improvements
- Validated Hardware
- Validated Configurations

**Highlights**
- Aligned with Habana 1.18 release with the improvements on FP8 and INT4 quantization for [Intel® Gaudi® AI accelerator](https://habana.ai/products/gaudi2/)
- Provided Transformer-like quantization API for weight-only quantization on LLM, which offers transformer-based user one-stop experience for quantization & inference with IPEX on Intel GPU and CPU.

**Features**
- Add Transformer-like quantization API for weight-only quantization on LLM
- Support fast quantization with light weight recipe and layer-wise approach on Intel AI PC
- Support INT4 quantization of Visual Language Model (VLM), like Llava, Phi-3-vision, Qwen-VL with AutoRound algorithm

**Improvements**
- Support AWQ format INT4 model loading and converting for IPEX inference in Transformer-like API
- Enable auto-round format export for INT4 model
- Support per-channel INT8 Post Training Quantization for PT2E

**Validated Hardware** 
- Intel Gaudi Al Accelerators (Gaudi 2 and 3)
- Intel Xeon Scalable processor (4th, 5th, 6th Gen)
- Intel Core Ultra Processors (Series 1 and 2)
- Intel Data Center GPU Max Series (1100)

**Validated Configurations**
- Centos 8.4 & Ubuntu 22.04 & Win 11
- Python 3.9, 3.10, 3.11, 3.12
- PyTorch/IPEX 2.2, 2.3, 2.4

3.0

- Highlights
- Features
- Improvements
- Examples
- Bug Fixes
- Documentations
- Validated Configurations

**Highlights**
- FP8 quantization and INT4 model loading support on [Intel® Gaudi® AI accelerator](https://habana.ai/products/gaudi2/)
- Framework extension API for quantization, mixed-precision and benchmarking
- Accuracy-aware FP16 mixed precision support on [Intel® Xeon® 6 Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon.html)
- Performance optimizations and usability improvements on client-side quantization

**Features**
- [Quantization] Support FP8 quantization on Gaudi ([95197d](https://github.com/intel/neural-compressor/commit/95197d1697e19323b124c2a32bdef7425d4d1c3e))
- [Quantization] Support INC and Hugging Face model loading on framework extension API for PyTorch ([0eced1](https://github.com/intel/neural-compressor/commit/0eced1478c6796a5e2dcb254a65bbc96af4d1b8b), [bacc16](https://github.com/intel/neural-compressor/commit/bacc164df2c2080cb6b1a6250f745824bbca5a7b))
- [Quantization] Support Weight-only Quantization on framework extension API for PyTorch ([34f0a9](https://github.com/intel/neural-compressor/commit/34f0a9f450b385aa3227f7f34e8d0f16460080a9), [de43d8](https://github.com/intel/neural-compressor/commit/de43d851a24a5f4290fe148f7d3607cad6d8433f), [4a4509](https://github.com/intel/neural-compressor/commit/4a45093c1418f34da2660a54052a2ff5c2b4edff), [1a4509](https://github.com/intel/neural-compressor/commit/1a4509060714559bdbc60524012997900c464d02), [a3a065](https://github.com/intel/neural-compressor/commit/a3a06508fa951f9b9dcd3786214f546c796c32e7), [1386ac](https://github.com/intel/neural-compressor/commit/1386ac5ec7be40608dfac082d2275307b8e4d14e), [a0dee9](https://github.com/intel/neural-compressor/commit/a0dee94dab0920ba30de049e871b19a72ddb8996), [503d9e](https://github.com/intel/neural-compressor/commit/503d9ef4136023f1952e397a2ab0f7f476040901), [84d705](https://github.com/intel/neural-compressor/commit/84d7055b3998724aecd7ca7e43ea653d0d0f4612), [099b7a](https://github.com/intel/neural-compressor/commit/099b7a4446d9c21af2066518ccc87ecaa717e08e), [e3c736](https://github.com/intel/neural-compressor/commit/e3c736fd910690faf08bf4609cc3b65529d79252), [e87c95](https://github.com/intel/neural-compressor/commit/e87c95f25d3fe0e286e832857974ce36d43b2f96), [2694bb](https://github.com/intel/neural-compressor/commit/2694bbf81622a936f5ef3c271901dea097af2474), [ec49a2](https://github.com/intel/neural-compressor/commit/ec49a29cafa92593d82635562ec200741fd4083c), [e7b4b6](https://github.com/intel/neural-compressor/commit/e7b4b648665df4d016d170cdf2f3f69e6f9c185f), [a9bf79](https://github.com/intel/neural-compressor/commit/a9bf79c63fbcd970cccc00d1db85e424fe286b27), [ac717b](https://github.com/intel/neural-compressor/commit/ac717bc4b6a1a1e82db218d7648121f157814fad), [915018](https://github.com/intel/neural-compressor/commit/9150181bb2ab71201fbdb052fbcaa2aba18a090a), [8447d7](https://github.com/intel/neural-compressor/commit/8447d7097fa33231b8a6e4a9e26e526d191787de), [dc9328](https://github.com/intel/neural-compressor/commit/dc9328c09b243d7df3bccc0a35a8a12feaabb40a))
- [Quantization] Support static and dynamic quantization in PT2E path ([7a4715](https://github.com/intel/neural-compressor/commit/7a4715c1d488441e383b7c999fd1b574a3f6ceda), [43c358](https://github.com/intel/neural-compressor/commit/43c3580bdb1c6765bb4902fe721da629518acc74), [30b36b](https://github.com/intel/neural-compressor/commit/30b36b83a195c6ea350692c7ac0bfec1b52ee419), [1f58f0](https://github.com/intel/neural-compressor/commit/1f58f024d812b6c1f7f3430b62e61051599cd1b2), [02958d](https://github.com/intel/neural-compressor/commit/02958dd4a81251be26980a712cbb258d55edba67))
- [Quantization] Support SmoothQuant and static quantization in IPEX path with framework extension API ([53e6ee](https://github.com/intel/neural-compressor/commit/53e6ee6b75d476bae0382c7d6fb9aa1348c2ab5e), [72fbce](https://github.com/intel/neural-compressor/commit/72fbce4b34f29c2b6fe0d41a76c4d65edb08719a), [eaa3a5](https://github.com/intel/neural-compressor/commit/eaa3a580c8a9f27268d3c27e551054dd5053f01c), [95e67e](https://github.com/intel/neural-compressor/commit/95e67eac624285d304487b654330d660b169cfb1), [855c10](https://github.com/intel/neural-compressor/commit/855c10ca37d01bd371a4b9dcd953ce735f9bdea6), [9c6102](https://github.com/intel/neural-compressor/commit/9c6102b351c45394357e0163470e0e997cb99d0e), [5dafe5](https://github.com/intel/neural-compressor/commit/5dafe5fd6584ca695f05b61c3dd84c2923c83cbd), [a5e5f5](https://github.com/intel/neural-compressor/commit/a5e5f5f64855b85e2a374c8b808b317448318113), [191383](https://github.com/intel/neural-compressor/commit/191383ebd95c1fbb77e626887ca6d808a454543c), [776645](https://github.com/intel/neural-compressor/commit/7766454d9a984257016ddad5d3a61de648f0bd35))
- [Quantization] Support Layer-wise Quantization for RTN/GPTQ on framework extension API for PyTorch ([649e6b](https://github.com/intel/neural-compressor/commit/649e6b148755bda737009bc323b735b92231c579))
- [Quantization] Support Post Training Quantization on framework extension API for Tensorflow ([6c27c1](https://github.com/intel/neural-compressor/commit/6c27c19c3ec7a318455bd12d6e66ad9bb757ab93), [e22c61](https://github.com/intel/neural-compressor/commit/e22c61ede2942f7f1ba1cf9e480491371184bb32), [f21afb](https://github.com/intel/neural-compressor/commit/f21afbbdd18cd61627fc02e5b22ca242402bcfbf), [3882e9](https://github.com/intel/neural-compressor/commit/3882e9cc4b356a081843455f3244d7f0e013f888), [2627d3](https://github.com/intel/neural-compressor/commit/2627d33b9ff900697184972575969ecc55da8923))
- [Quantization] Support Post Training Quantization on Keras3 ([f67e86](https://github.com/intel/neural-compressor/commit/f67e8613c409563f016c77e05a1acb969790cfc6), [047560](https://github.com/intel/neural-compressor/commit/047560fcf6a2e5812d33e579e047a3c8767e4a9a))
- [Quantization] Support Weight-only Quantization on Gaudi2 ([4b9b44](https://github.com/intel/neural-compressor/commit/4b9b447aa0872a8edc26fd59a349c195cf208a97), [14868c](https://github.com/intel/neural-compressor/commit/14868c0900a1f91fe39f138c67156ad66c16b20f), [0a3d4b](https://github.com/intel/neural-compressor/commit/0a3d4bd43f69c29e2f8a3b07ac13036e41c6579c))
- [Quantization] Improve performance and usability of quantization procedure on client side ([16a7b1](https://github.com/intel/neural-compressor/commit/16a7b11508c008d4d4180a0fe0e31c75b8e5d662))
- [Quantization] Support auto-device detection on framework extension API for PyTorch ([368ba5](https://github.com/intel/neural-compressor/commit/368ba5293ab2936c685d67db6f8423a27a62f7e1), [4b9b44](https://github.com/intel/neural-compressor/commit/4b9b447aa0872a8edc26fd59a349c195cf208a97), [e81a2d](https://github.com/intel/neural-compressor/commit/e81a2dd901dd1b93291555722c6d96901940be06), [0a3d4b](https://github.com/intel/neural-compressor/commit/0a3d4bd43f69c29e2f8a3b07ac13036e41c6579c), [534300](https://github.com/intel/neural-compressor/commit/53430092e7f9b46ed78afccb4b9610c9032bf57f), [2a86ae](https://github.com/intel/neural-compressor/commit/2a86aeafc754ca3b7495138381efb8faa9397fdf))
- [Quantization] Support Microscaling(MX) Quant for PyTorch ([4a24a6](https://github.com/intel/neural-compressor/commit/4a24a6a39218a3d186900a72a7e2e96ad539f4f4), [455f1e](https://github.com/intel/neural-compressor/commit/455f1e1f0f0284e87b46d257b6d126ca76fe1748))
- [Quantization] Enable cross-devices Half-Quadratic Quantization(HQQ) for LLMs support ([db6164](https://github.com/intel/neural-compressor/commit/db6164a25da5bf8ef8a7ba082a25d7bb4565b656), [07f940](https://github.com/intel/neural-compressor/commit/07f940c7f00ab0a5f6b3d7d9cb6b934e69e44a98))
- [Quantization] Support FP8 cast Weight-only Quantization ([57ed61](https://github.com/intel/neural-compressor/commit/57ed6138453246141a2128b600588df0b4d5d440))
- [Mixed-Precision] Support FP16 mixed-precision on framework extension autotune API for PyTorch ([2e1cdc](https://github.com/intel/neural-compressor/commit/2e1cdc5be61458be186d0e6f2035b4287b223cf3))
- [Mixed-Precision] Support mixed `INT8` with `FP16` in PT2E path ([fa961e](https://github.com/intel/neural-compressor/commit/fa961e1d0bbe371182d6da6d210d0b6a7693cce2))
- [AutoTune] Support accuracy-aware tuning on framework extension API ([e97659](https://github.com/intel/neural-compressor/commit/e9765955f991e1270e3b65635285f6b6cb8fc38c), [7b8aec](https://github.com/intel/neural-compressor/commit/7b8aec00d0c09bd499076457b68903229e09b803), [5a0374](https://github.com/intel/neural-compressor/commit/5a0374e7db23cac209af78f1ace9b38d23bebbb0), [a4675c](https://github.com/intel/neural-compressor/commit/a4675c7490f66ab2c75912dd69f1d79368f69858), [3a254e](https://github.com/intel/neural-compressor/commit/3a254e99c0a361c0179b4176a256c69e46681352), [ac47d9](https://github.com/intel/neural-compressor/commit/ac47d9b97b597f809ab56f9f6cb1a86951e2e334), [b8d98e](https://github.com/intel/neural-compressor/commit/b8d98ebaddcf1c7ece1def04ba4d55b7e92593ee), [fb6142](https://github.com/intel/neural-compressor/commit/fb61428228bcdf9a18b02e5963c4df7a60c9a54b), [fa8e66](https://github.com/intel/neural-compressor/commit/fa8e66a1d95b52c8ebdea21f2dc60db0fdfedd6a), [d22df5](https://github.com/intel/neural-compressor/commit/d22df5364ba6d7c98fea8545a9a9e49e2ce5ebb0), [09eb5d](https://github.com/intel/neural-compressor/commit/09eb5ddd3c0eb2dae198837cbae76ca5bb4e90c8), [c6a8fa](https://github.com/intel/neural-compressor/commit/c6a8fa1606a4aea34e62af0f106ab05cdccacab6))
- [Benchmarking] Implement `incbench` command for ease-of-use benchmark ([2fc725](https://github.com/intel/neural-compressor/commit/2fc72555c987dc7bce8476b389720e1a29159a43))

**Improvements**
- [Quantization] Integrate AutoRound v0.3 ([bfa27e](https://github.com/intel/neural-compressor/commit/bfa27e422dc4760f6a9b1783eee7dae10fe5324f), [fd9685](https://github.com/intel/neural-compressor/commit/fd96851f7f8339ec8bfabd602cf494ac6c31d17b))
- [Quantization] Support auto_host2device on RTN and GPTQ([f75ff4](https://github.com/intel/neural-compressor/commit/f75ff4082bc7a22d9367d3e91a3ea2c7aaec2bd2))
- [Quantization] Support `transformers.Conv1D` WOQ quantization ([b6237c](https://github.com/intel/neural-compressor/commit/b6237cf4d4c8e86fe373cf48ffe5a6588ef537ca))
- [Quantization] support quant_lm_head argument in all WOQ configs ([4ae2e8](https://github.com/intel/neural-compressor/commit/4ae2e87d2f98eb34c2e523a76ffa6ff77bf767e1))
- [Quantization] Update fp4_e2m1 mapping list to fit neural_speed and qbits inference ([5fde50](https://github.com/intel/neural-compressor/commit/5fde50f2c0476dbc08d59481b742515f5a210de1))
- [Quantization] Enhance load_empty_model import ([29471d](https://github.com/intel/neural-compressor/commit/29471df05a9e2c36c4ad8083c0b0b285011748d8))
- [Common] Add common logger to the quantization process ([1cb844](https://github.com/intel/neural-compressor/commit/1cb844b3c0b581f670fef16aa87fef2a85e6122b), [482f87](https://github.com/intel/neural-compressor/commit/482f87c6161581f9f8ff09804b6c430553cf59a9), [83bc77](https://github.com/intel/neural-compressor/commit/83bc779a4e97d8886383025d324d8379f70cc8b7), [f50baf](https://github.com/intel/neural-compressor/commit/f50baf2e9107e29d96e267fe115dc488f96db6f0))
- [Common] Enhance the `set_local` for operator type ([a58638](https://github.com/intel/neural-compressor/commit/a58638c1298fdff808742d1625196153d24f5c9c))
- [Common] Port more helper classes from 2.x ([3b150d](https://github.com/intel/neural-compressor/commit/3b150d61313ca6ca19bc38ec9f608900b8355519))
- [Common] Refine base config for 3.x API ([be42d0](https://github.com/intel/neural-compressor/commit/be42d033b25c6dd3bcac0ead964699f25f939014), [efea08](https://github.com/intel/neural-compressor/commit/efea089e27613690c32d6f1745731a28ca90bf65))
- [Export] Migrate export feature to 2.x and 3.x from deprecated ([794b27](https://github.com/intel/neural-compressor/commit/794b2762c0bb2f076973e1fca5fdecd23efec774))

**Examples**
- Add save/load for PT2E example ([0e724a](https://github.com/intel/neural-compressor/commit/0e724a4d96ca0d6a170281688ca644b37fa340e0))
- Add IPEX XPU example for framework extension API ([6e1b1d](https://github.com/intel/neural-compressor/commit/6e1b1da712d20d9291e5932974bc3167b00dd214))
- Enable TensorFlow yolov5 example for framework extension API ([19024b](https://github.com/intel/neural-compressor/commit/19024b351372ca76934db33b0d230552c13bff39))
- Update example for framework extension IPEX SmoothQuant ([b35ff8](https://github.com/intel/neural-compressor/commit/b35ff8f0044bdf12da87647d0404b62ae5ff7d3d))
- Add SDXL model example for framework extension API ([000946](https://github.com/intel/neural-compressor/commit/000946fce147a02ad6662538e337570c0a56329d))
- Add PyTorch mixed precision example ([e106de](https://github.com/intel/neural-compressor/commit/e106dea73471ddecdb1cfc702e90fcb1a5d41452), [9077b3](https://github.com/intel/neural-compressor/commit/9077b382259e2e56ff5796084a1f4275e4387537))
- Add CV and LLM examples for PT2E quantization path ([b401b0](https://github.com/intel/neural-compressor/commit/b401b02db2cc7d7f4f8412a815fa435e66e330a0))
- Add Recommendation System examples for IPEX path ([e470f6](https://github.com/intel/neural-compressor/commit/e470f6cdfbbad32fcf17be56903e649a05059780))
- Add TensorFlow examples for framework extension API ([fb8577](https://github.com/intel/neural-compressor/commit/fb8577931c11c3bdc55868e01576b73372d9912b), [922b24](https://github.com/intel/neural-compressor/commit/922b2471e617cc4c56376866e991302d0beb0640))
- Add PyTorch Microscaling(MX) Quant examples ([6733da](https://github.com/intel/neural-compressor/commit/6733dabc4d48a6625e184e4a29a754949f415097))
- Add PyTorch SmoothQuant LLM examples for new framework extension API ([137fa3](https://github.com/intel/neural-compressor/commit/137fa3add2d8a0688dd0e76bd15e347b588d56a8))
- Add PyTorch GPTQ/RTN example for framework extension API ([813d93](https://github.com/intel/neural-compressor/commit/813d93051ab16b6bbac11bdf5986929330876e30))
- Add double quant example ([ccd0c9](https://github.com/intel/neural-compressor/commit/ccd0c9e6c112d84979504177b9390270b3d71b69))

**Bug Fixes**
- Fix ITREX qbits nf4/int8 training core dumped issue ([190e6b](https://github.com/intel/neural-compressor/commit/190e6b2be6b31158a1101729bcf621bc93e85531))
- Fix unused pkgs import ([437c8e](https://github.com/intel/neural-compressor/commit/437c8e75706cff1767dcde115e428654766b3f18))
- Remove Gelu Fusion for TensorFlow New API ([5592ac](https://github.com/intel/neural-compressor/commit/5592acc60562b7fccb308af0eaaba9cad53004a5))
- Fix GPTQ layer match issue ([90fb43](https://github.com/intel/neural-compressor/commit/90fb43135397a035968b5334eba21931c18a83c0))
- Fix static quant regression issue on IPEX path ([70a1d5](https://github.com/intel/neural-compressor/commit/70a1d501fdfee16a10e34385bca9f15eba4366b4))
- Fix config expansion with empty options ([6b2738](https://github.com/intel/neural-compressor/commit/6b2738390dfdab543de1ccd9242fe541c78b6a2e))
- Fix act_observer for IPEX SmoothQuant and static quantization ([263450](https://github.com/intel/neural-compressor/commit/2634501690f2396865011c2f79c0b8adba36cb07))
- Set automatic return_dict=False for GraphTrace ([53e7df](https://github.com/intel/neural-compressor/commit/53e7dfe57ef4ad1754f37343b3ad3850b64ae4f4))
- Fix WOQ Linear pack slow issue ([da1ada](https://github.com/intel/neural-compressor/commit/da1ada236eb867b69c663c58904e0a21ad9bcb88), [daa143](https://github.com/intel/neural-compressor/commit/daa1431b200f92ab9684a2c78e15602cb23d7c07))
- Fix dtype of unpacked tensor ([29fdec](https://github.com/intel/neural-compressor/commit/29fdecbbb44ceb8d19c12809af90dc23063becfc))
- Fix WeightOnlyLinear bits type when dtype="intx" ([19ff13](https://github.com/intel/neural-compressor/commit/19ff13e8a8963744349e46013ef522fcb3e8c3d8))
- Fix several issues for SmoothQuant and static quantization ([7120dd](https://github.com/intel/neural-compressor/commit/7120dd4909599b228692415732688b3d5e77206d))
- Fix IPEX examples failed with evaluate ([e82674](https://github.com/intel/neural-compressor/commit/e82674a75de564a632cea639db25fbe41fec100a))
- Fix HQQ issue for group size of -1 ([8dac9f](https://github.com/intel/neural-compressor/commit/8dac9f2c3d3f8411f27a2e327f3dbbc7c8de0829))
- Fix bug in GPTQ g_idx ([4f893c](https://github.com/intel/neural-compressor/commit/4f893ca9e4c44d12ea028e00a4881b5154ee54a8))
- Fix tune_cfg issue for static quant ([ba1650](https://github.com/intel/neural-compressor/commit/ba165047dbcf4671cf20e9c1d031577dade94348))
- Add non-str `op_name` match workaround for IPEX ([911ccd](https://github.com/intel/neural-compressor/commit/911ccd3a94b124e2287780a2ca219eaa01dc21d9))
- Fix opt GPTQ double quant example config ([62aa85](https://github.com/intel/neural-compressor/commit/62aa85df23ce3f5db353ce9a4bfb8cd88395c376))
- Fix GPTQ accuracy issue in framework extension API example ([c701ea](https://github.com/intel/neural-compressor/commit/c701eaff7d69c46a57172b0547bfe2fc05164a0c))
- Fix bf16 symbolic_trace bug ([3fe2fd](https://github.com/intel/neural-compressor/commit/3fe2fd9aadda4991552d65fef09a75ba5127b5db))
- Fix `opt_125m_woq_gptq_int4_dq_ggml` issue ([b99aba](https://github.com/intel/neural-compressor/commit/b99abae5d937380cf9df80c9050fce18bddfb72d))

**Documentations**
- Update new API installation document ([50eb6f](https://github.com/intel/neural-compressor/commit/50eb6fb6f5924054b38d8ed99e78e0ebdab51f50), [ff3740](https://github.com/intel/neural-compressor/commit/ff3740146a829e845d79266acf233b202843d3fd))
- Add new architecture diagram ([d56075](https://github.com/intel/neural-compressor/commit/d56075c7e9f6e3e85385abbff9f1b0d07d157a04), [2c3556](https://github.com/intel/neural-compressor/commit/2c3556d441de2f0963167db71ecdee7353bd76bb))
- Add new workflow diagram ([96538c](https://github.com/intel/neural-compressor/commit/96538c56fea8a42c3e487b4682c346e4832e3e97))
- Update documentation for framework extension API ([0a5423](https://github.com/intel/neural-compressor/commit/0a542397ac1ea8d6fe2edf04565d3cb673001b2c))
- Add documents for framework extension API for PyTorch ([ecffc2](https://github.com/intel/neural-compressor/commit/ecffc2eb29ada100d2b60574258d8a1b6548e449))
- Add documents for framework extension API for TensorFlow ([4dbf71](https://github.com/intel/neural-compressor/commit/4dbf71e412a370f09809db89db27a0b7c7b56d14))
- Add documents for autotune API ([853dc7](https://github.com/intel/neural-compressor/commit/853dc71eee292e93e38f91683ec8229eb14c25da), [de3e94](https://github.com/intel/neural-compressor/commit/de3e94f6d15f74bb3081366dd1c045d006adfa00))
- Update for API 3.0 online doc ([81a076](https://github.com/intel/neural-compressor/commit/81a076d7c59609be666ddddf64a574cacf1a5c36), [87f02c](https://github.com/intel/neural-compressor/commit/87f02c15a2f1047a8b4bcb5b7f443a4cecb4dfc7), [efcb29](https://github.com/intel/neural-compressor/commit/efcb2930be6b9d575b1fb8a6e86afdd6a09b5857))
- Add docstring for API modules ([aa42e5](https://github.com/intel/neural-compressor/commit/aa42e5edcd0b5196a21ee7bb68a7965125601fea), [5767ae](https://github.com/intel/neural-compressor/commit/5767aed4dbc9a400f65f74bdc9c09209f0a4c145), [296c5d](https://github.com/intel/neural-compressor/commit/296c5d4f1138e5bf33584fb75cea0f6ca5080122), [1ebf69](https://github.com/intel/neural-compressor/commit/1ebf6987bd054b926d3cdd5630ae058c8d3a66c2), [0c52e1](https://github.com/intel/neural-compressor/commit/0c52e1243b78734e95fc348834303bc3c3cfe369), [b78794](https://github.com/intel/neural-compressor/commit/b787940ea2868e1fc8a56a81b94d62d4ea3d8454), [6b3020](https://github.com/intel/neural-compressor/commit/6b30207d0a3b6d6d497ecf8f6bb5891765d798ba), [28578b](https://github.com/intel/neural-compressor/commit/28578b96bf6217fa2b79699838e5a4af30843de4))
- Add doc for client usage ([d254d5](https://github.com/intel/neural-compressor/commit/d254d508be9c6b14c474fd643ad448a4e261ca72), [305838](https://github.com/intel/neural-compressor/commit/30583882df76838ea3e4a719e25ddca7bb449b9b))
- Remove 1.x API documents ([705672](https://github.com/intel/neural-compressor/commit/7056720df96f17c706522bc6b0530df534d22ee7), [d32046](https://github.com/intel/neural-compressor/commit/d3204604aad007f3db67c46dcb0575aa8f5cd584))
- Add readme for framework extension API examples ([385da7](https://github.com/intel/neural-compressor/commit/385da7c7ed018a66fcba6e28658d1a5eea2e52e4))
- Add version mapping between INC and Gaudi SW Stack ([acd8f4](https://github.com/intel/neural-compressor/commit/acd8f4f182eaccf03b221f765ec0ddb451be3415))

**Validated Configurations**
- Centos 8.4 & Ubuntu 22.04 & Win 11 & MacOS Ventura 13.5
- Python 3.8, 3.9, 3.10, 3.11
- PyTorch/IPEX 2.1, 2.2, 2.3
- TensorFlow 2.14, 2.15, 2.16
- ONNX Runtime 1.16, 1.17, 1.18

2.6

- Highlights
- Features
- Improvements
- Examples
- Bug Fixes
- External Contributions
- Validated Configurations

**Highlights**
- Integrated recent [AutoRound](https://github.com/intel/auto-round/releases/tag/v0.2) with lm-head quantization support and calibration process optimizations
- Migrated ONNX model quantization capability into ONNX project [Neural Compressor](https://github.com/onnx/neural-compressor)

**Features**
- [Quantization] Integrate recent [AutoRound](https://github.com/intel/auto-round/releases/tag/v0.2) with lm-head quantization support and calibration process optimizations ([4728fd](https://github.com/intel/neural-compressor/commit/4728fdccbbc3d9d8a213a1234aed7921596ddd51))
- [Quantization] Support true sequential options in GPTQ ([92c942](https://github.com/intel/neural-compressor/commit/92c9423ccc09e4ea4a26cbf925b9202d888c6564))

**Improvements**
- [Quantization] Improve WOQ Linear pack/unpack speed with numpy implementation ([daa143](https://github.com/intel/neural-compressor/commit/daa1431b200f92ab9684a2c78e15602cb23d7c07))
- [Quantization] Auto detect available device when exporting ([7be355](https://github.com/intel/neural-compressor/commit/7be355dee66ca5bd2355711d1c4ff799b23c891c))
- [Quantization] Refine AutoRound export to support Intel GPU ([409231](https://github.com/intel/neural-compressor/commit/40923112e72c564a6065b87afc84a9a64671aa41))
- [Benchmarking] Detect the number of sockets when needed ([e54b93](https://github.com/intel/neural-compressor/commit/e54b937e93f386e525a6e6e84d65b39c3022925c))

**Examples**
- Upgrade lm_eval to 0.4.2 in PT and ORT LLM example ([fdb509](https://github.com/intel/neural-compressor/commit/fdb5097907acae0e60834423c6c00323e73b962e)) ([54f039](https://github.com/intel/neural-compressor/commit/54f039d0424cf8a724218158c60d12241f5be24a))
- Add diffusers/dreambooth example with IPEX ([ba4798](https://github.com/intel/neural-compressor/commit/ba479850d3365fa8fae145b1376f104211af8b66))

**Bug Fixes**
- Fix incorrect dtype of unpacked tensor issue in PT ([29fdec](https://github.com/intel/neural-compressor/commit/29fdecbbb44ceb8d19c12809af90dc23063becfc))
- Fix TF LLM SQ legacy Keras environment variable issue ([276449](https://github.com/intel/neural-compressor/commit/27644940e7468f8c46a10c5313aa1729176a11a3))
- Fix TF estimator issue by adding version check on TF2.16 ([855b98](https://github.com/intel/neural-compressor/commit/855b9881f07ae0227c9a3773a69b9bd2ec7e4602))
- Fix missing tokenizer issue in run_clm_no_trainer.py after using lm-eval 0.4.2 ([d64029](https://github.com/intel/neural-compressor/commit/d640297e2ca495fba5c6cb540965c1ffeed7c94a))
- Fix AWQ padding issue in ORT ([903da4](https://github.com/intel/neural-compressor/commit/903da49d5f5bbd95edb6d268c71b34f26133b622))
- Fix recover function issue in ORT ([ee24db](https://github.com/intel/neural-compressor/commit/ee24dba141ef8c2ac14a3f0cce84c88952048cea))
- Update model ckpt download url in prepare_model.py ([0ba573](https://github.com/intel/neural-compressor/commit/0ba57320728b4b7df5d1fe39e83ee9ea0a7cdaa9))
- Fix case where pad_max_length set to None ([960bd2](https://github.com/intel/neural-compressor/commit/960bd2b91cbd55870385e918f98d740b5044abbb))
- Fix a failure for GPU backend ([71a9f3](https://github.com/intel/neural-compressor/commit/71a9f3940aa07d2985d8a2ee9e1f914d0576f8ac))
- Fix numpy versions for rnnt and 3d-unet examples ([12b8f4](https://github.com/intel/neural-compressor/commit/12b8f41d985d7ac56e1547dde0afc6e3393f8569))
- Fix CVEs ([5b5579](https://github.com/intel/neural-compressor/commit/5b5579bf953cb24607dc18b3a01ffe1071c3b604)) ([25c71a](https://github.com/intel/neural-compressor/commit/25c71aad5a55210d87d371257344f21762e3bb0e)) ([47d73b](https://github.com/intel/neural-compressor/commit/47d73b34f80a29fd16cf17ba71758b7228cc6f34)) ([41da74](https://github.com/intel/neural-compressor/commit/41da740e517cd266176059b52fe482d5fb863b80))


**External Contributions**
- Update model ckpt download url in prepare_model.py ([0ba573](https://github.com/intel/neural-compressor/commit/0ba57320728b4b7df5d1fe39e83ee9ea0a7cdaa9))
- Fix case where pad_max_length set to None ([960bd2](https://github.com/intel/neural-compressor/commit/960bd2b91cbd55870385e918f98d740b5044abbb))
- Add diffusers/dreambooth example with IPEX ([ba4798](https://github.com/intel/neural-compressor/commit/ba479850d3365fa8fae145b1376f104211af8b66))

**Validated Configurations**
- Centos 8.4 & Ubuntu 22.04 & Win 11 & MacOS Ventura 13.5
- Python 3.8, 3.9, 3.10, 3.11
- PyTorch/IPEX 2.1, 2.2, 2.3
- TensorFlow 2.14, 2.15, 2.16
- ITEX 2.13.0, 2.14.0, 2.15.0
- ONNX Runtime 1.16, 1.17, 1.18

2.5.1

- Improvement
- Bug Fixes
- Validated Configurations

**Improvement**
- Improve WOQ AutoRound export ([409231](https://github.com/intel/neural-compressor/commit/40923112e72c564a6065b87afc84a9a64671aa41), [7ee721](https://github.com/intel/neural-compressor/commit/7ee72159e2d1a7de2e2d36813ea98fa5a9e909f6))
- Adapt ITREX v1.4 release for example evaluate ([9d7a05](https://github.com/intel/neural-compressor/commit/9d7a0520870fe0301ab26b794504a26f9fffbb06))
- Update more supported LLM recipes ([ce9b16](https://github.com/intel/neural-compressor/commit/ce9b16405d4ada62a7dfcb285eb223dd6ba7d774))

**Bug Fixes**
- Fix WOQ RTN supported layer checking condition ([079177](https://github.com/intel/neural-compressor/commit/07917762819df46340665b5c289cd78518cb5e4e))
- Fix in-place processing error in quant_weight function ([92533a](https://github.com/intel/neural-compressor/commit/92533a69543d0a20b3bed4adb7d6787cdcd70f41))

**Validated Configurations**
- Centos 8.4 & Ubuntu 22.04
- Python 3.10
- TensorFlow 2.15
- ITEX 2.14.0
- PyTorch/IPEX 2.2
- ONNX Runtime 1.17

2.5

- Highlights
- Features
- Improvement
- Productivity
- Bug Fixes
- External Contributes
- Validated Configurations

**Highlights**
- Integrated Weight-Only Quantization algorithm [AutoRound](https://github.com/intel/auto-round) and verified on Gaudi2, Intel CPU, NV GPU
- Applied SmoothQuant & Weight-Only Quantization algorithms with 15+ popular LLMs for INT8 & INT4 quantization and published the [recipes](https://github.com/intel/neural-compressor/blob/master/docs/source/llm_recipes.md)


**Features**
- [Quantization] Integrate Weight-Only Quantization algorithm [AutoRound](https://github.com/intel/auto-round) ([5c7f33](https://github.com/intel/neural-compressor/commit/5c7f336037e602321efec71aa516bc4fade082c8), [dfd083](https://github.com/intel/neural-compressor/commit/dfd083df3234aa27548391c381d3ac9dc8139676), [9a7ddd](https://github.com/intel/neural-compressor/commit/9a7ddda6f852792663e9cf3f076b88caa32e83a8), [cf1de7](https://github.com/intel/neural-compressor/commit/cf1de74608c4dc16bb18abee8f9680985af5ad6e))
- [Quantization] Quantize weight with in-place mode in Weight-Only Quantization ([deb1ed](https://github.com/intel/neural-compressor/commit/deb1ed51cd2ad9768318ae49763898d3bb7af663))
- [Pruning] Enable SNIP on multiple cards using DeepSpeed ZeRO-3 ([49ab28](https://github.com/intel/neural-compressor/commit/49ab28d362b03338194160e5ef67a3b8c7967a86))
- [Pruning] Support new pruning approach Wanda and DSNOT for PyTorch LLM ([7a3671](https://github.com/intel/neural-compressor/commit/7a367179804565777241f73a87f903c82b6723e0))


**Improvement**
- [Quantization] SmoothQuant code structure refactor ([a8d81c](https://github.com/intel/neural-compressor/commit/a8d81caacd6aeac640d8a1b38af2e10225ee97c0))
- [Quantization] Optimize the workflow of parsing Keras model ([b816d7](https://github.com/intel/neural-compressor/commit/b816d7769992fefc711fcbff21d0dbea8a36230e))
- [Quantization] Support static_groups options in GPTQ API ([1c426a](https://github.com/intel/neural-compressor/commit/1c426a0738c68403a2ef3aaef9b19ecff8f2721a))
- [Quantization] Update TEQ train dataloader ([d1e994](https://github.com/intel/neural-compressor/commit/d1e994becee163da816d027bf39389581751920b))
- [Quantization] WeightOnlyLinear keeps self.weight after recover ([2835bd](https://github.com/intel/neural-compressor/commit/2835bdbd3b5a2797ea1323818b2d154154f93331))
- [Quantization] Add version condition for IPEX prepare init ([d96e14](https://github.com/intel/neural-compressor/commit/d96e14aff080813ee55badc84e4efb59dddc73d7))
- [Quantization] Enhance the ORT node name checking ([f1597a](https://github.com/intel/neural-compressor/commit/f1597aae743f19745822f9e57e2634cbb1a08098))
- [Pruning] Stop the tuning process early when enabling smooth quant ([844a03](https://github.com/intel/neural-compressor/commit/844a032766ff823280d4bbfd350f65d3f6c284db))


**Productivity**
- ORT LLM examples support latest optimum version ([26b260](https://github.com/intel/neural-compressor/commit/26b260e174cac13b023a11caab372b2dcdc593e0))
- Add coding style docs and recommended VS Code setting ([c1f23c](https://github.com/intel/neural-compressor/commit/c1f23ce5a54caf907951d5daf7c14e80259ad25f))
- Adapt transformers 4.37 loading ([6133f4](https://github.com/intel/neural-compressor/commit/6133f4e158648d242f3f78f125b0c59c8a214cb7))
- Upgrade pre-commit checker for black/blacken-docs/ruff ([7763ed](https://github.com/intel/neural-compressor/commit/7763ed08c07d9e048467429b1b727e254db26665))
- Support CI summary in PR comments ([d4bcdd](https://github.com/intel/neural-compressor/commit/d4bcdd459cb8aab7268dc96528720c36d81f1ee3)))
- Notebook example update to install latest INC & TF, add metric in fit ([4239d3](https://github.com/intel/neural-compressor/commit/4239d3675fe833aa4d145ffb49dd938bdc4c8726))


**Bug Fixes**
- Fix QA IPEX example fp32 input issue ([c4de19](https://github.com/intel/neural-compressor/commit/c4de1982961e604e698729fb153cd330d4139777))
- Update Conditions of Getting min-max during TF MatMul Requantize ([d07175](https://github.com/intel/neural-compressor/commit/d07175c39cd796c17582e986268a3a7179683763))
- Fix TF saved_model issues ([d8e60b](https://github.com/intel/neural-compressor/commit/d8e60b8eda59098bd29d6e314ed3383300c0f642))
- Fix comparison of module_type and MulLinear ([ba3aba](https://github.com/intel/neural-compressor/commit/ba3abac86b92ddac192d8920df28979e5cdeb5c5))
- Fix ORT calibration issue ([cd6d24](https://github.com/intel/neural-compressor/commit/cd6d244de2687de9e43a12b1f4032d9ff2281874))
- Fix ORT example bart export failure ([b0dc0d](https://github.com/intel/neural-compressor/commit/b0dc0de8325ee9686900d904c62bb5f36626f3ed))
- Fix TF example accuracy diff during benchmark and quantization ([5943ea](https://github.com/intel/neural-compressor/commit/5943eaefd11a19099c78090e590d4fb783c5e1ad))
- Fix bugs for GPTQ exporting with static_groups ([b4e37b](https://github.com/intel/neural-compressor/commit/b4e37b74ff077acf85246527a7ddae7a2e3f08d1))
- Fix ORT quant issue caused by tensors having same name ([0a20f3](https://github.com/intel/neural-compressor/commit/0a20f30a6e4734b3fb84a27266c386c32bd1d4a8))
- Fix Neural Solution SQL/CMD injection ([14b7b0](https://github.com/intel/neural-compressor/commit/14b7b0ab8ccb2c04f6595aeccb37f689e090c806))
- Fix the best qmodel recovery issue ([f2d9b7](https://github.com/intel/neural-compressor/commit/f2d9b78b1cd5908a9613deb1d9e12f7f31a0e308))
- Fix logger issue ([83bc77](https://github.com/intel/neural-compressor/commit/83bc779a4e97d8886383025d324d8379f70cc8b7))
- Store token in protected file ([c6f9cc](https://github.com/intel/neural-compressor/commit/c6f9ccaa25d73461cb502e43b1b0dd2b9e98909f))
- Define the default SSL context ([b08725](https://github.com/intel/neural-compressor/commit/b08725a5afe5ffc97807d1d743da74b4f0432fee))
- Fix IPEX stats bug ([5af383](https://github.com/intel/neural-compressor/commit/5af3834651900a8a3bc80a2b43d40fc153fbcf85))
- Fix ORT calibration for Dml EP ([c58aea](https://github.com/intel/neural-compressor/commit/c58aeaa7b88a14d05e94500f2b644caed189e80c))
- Fix wrong socket number retrieval for non-english system ([5b2a88](https://github.com/intel/neural-compressor/commit/5b2a88708eb5a63b760a5a109064e740b3d5297a))
- Fix trust remote for llm examples ([2f2c9a](https://github.com/intel/neural-compressor/commit/2f2c9a246176a641d4695e92db3b2d0514f21673))


**External Contributes**
- Intel Mac support ([21cfeb](https://github.com/intel/neural-compressor/commit/21cfeb83cbcd065a5d096c3a84da3deab256ea07))
- Add PTQ example for PyTorch CV Segment Anything Model ([bd5e69](https://github.com/intel/neural-compressor/commit/bd5e698a11c3b144788f9b431f65a8ab06374c0b))


**Validated Configurations**
- Centos 8.4 & Ubuntu 22.04 & Win 11 & MacOS Ventura 13.5
- Python 3.8, 3.9, 3.10, 3.11
- TensorFlow 2.13, 2.14, 2.15
- ITEX 2.13.0, 2.14.0
- PyTorch/IPEX 2.0, 2.1, 2.2
- ONNX Runtime 1.15, 1.16, 1.17

2.5.0

Page 1 of 7

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.