- Highlights
- Features
- Improvements
- Examples
- Bug Fixes
- Documentations
- Validated Configurations
**Highlights**
- FP8 quantization and INT4 model loading support on [Intel® Gaudi® AI accelerator](https://habana.ai/products/gaudi2/)
- Framework extension API for quantization, mixed-precision and benchmarking
- Accuracy-aware FP16 mixed precision support on [Intel® Xeon® 6 Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon.html)
- Performance optimizations and usability improvements on client-side quantization
**Features**
- [Quantization] Support FP8 quantization on Gaudi ([95197d](https://github.com/intel/neural-compressor/commit/95197d1697e19323b124c2a32bdef7425d4d1c3e))
- [Quantization] Support INC and Hugging Face model loading on framework extension API for PyTorch ([0eced1](https://github.com/intel/neural-compressor/commit/0eced1478c6796a5e2dcb254a65bbc96af4d1b8b), [bacc16](https://github.com/intel/neural-compressor/commit/bacc164df2c2080cb6b1a6250f745824bbca5a7b))
- [Quantization] Support Weight-only Quantization on framework extension API for PyTorch ([34f0a9](https://github.com/intel/neural-compressor/commit/34f0a9f450b385aa3227f7f34e8d0f16460080a9), [de43d8](https://github.com/intel/neural-compressor/commit/de43d851a24a5f4290fe148f7d3607cad6d8433f), [4a4509](https://github.com/intel/neural-compressor/commit/4a45093c1418f34da2660a54052a2ff5c2b4edff), [1a4509](https://github.com/intel/neural-compressor/commit/1a4509060714559bdbc60524012997900c464d02), [a3a065](https://github.com/intel/neural-compressor/commit/a3a06508fa951f9b9dcd3786214f546c796c32e7), [1386ac](https://github.com/intel/neural-compressor/commit/1386ac5ec7be40608dfac082d2275307b8e4d14e), [a0dee9](https://github.com/intel/neural-compressor/commit/a0dee94dab0920ba30de049e871b19a72ddb8996), [503d9e](https://github.com/intel/neural-compressor/commit/503d9ef4136023f1952e397a2ab0f7f476040901), [84d705](https://github.com/intel/neural-compressor/commit/84d7055b3998724aecd7ca7e43ea653d0d0f4612), [099b7a](https://github.com/intel/neural-compressor/commit/099b7a4446d9c21af2066518ccc87ecaa717e08e), [e3c736](https://github.com/intel/neural-compressor/commit/e3c736fd910690faf08bf4609cc3b65529d79252), [e87c95](https://github.com/intel/neural-compressor/commit/e87c95f25d3fe0e286e832857974ce36d43b2f96), [2694bb](https://github.com/intel/neural-compressor/commit/2694bbf81622a936f5ef3c271901dea097af2474), [ec49a2](https://github.com/intel/neural-compressor/commit/ec49a29cafa92593d82635562ec200741fd4083c), [e7b4b6](https://github.com/intel/neural-compressor/commit/e7b4b648665df4d016d170cdf2f3f69e6f9c185f), [a9bf79](https://github.com/intel/neural-compressor/commit/a9bf79c63fbcd970cccc00d1db85e424fe286b27), [ac717b](https://github.com/intel/neural-compressor/commit/ac717bc4b6a1a1e82db218d7648121f157814fad), [915018](https://github.com/intel/neural-compressor/commit/9150181bb2ab71201fbdb052fbcaa2aba18a090a), [8447d7](https://github.com/intel/neural-compressor/commit/8447d7097fa33231b8a6e4a9e26e526d191787de), [dc9328](https://github.com/intel/neural-compressor/commit/dc9328c09b243d7df3bccc0a35a8a12feaabb40a))
- [Quantization] Support static and dynamic quantization in PT2E path ([7a4715](https://github.com/intel/neural-compressor/commit/7a4715c1d488441e383b7c999fd1b574a3f6ceda), [43c358](https://github.com/intel/neural-compressor/commit/43c3580bdb1c6765bb4902fe721da629518acc74), [30b36b](https://github.com/intel/neural-compressor/commit/30b36b83a195c6ea350692c7ac0bfec1b52ee419), [1f58f0](https://github.com/intel/neural-compressor/commit/1f58f024d812b6c1f7f3430b62e61051599cd1b2), [02958d](https://github.com/intel/neural-compressor/commit/02958dd4a81251be26980a712cbb258d55edba67))
- [Quantization] Support SmoothQuant and static quantization in IPEX path with framework extension API ([53e6ee](https://github.com/intel/neural-compressor/commit/53e6ee6b75d476bae0382c7d6fb9aa1348c2ab5e), [72fbce](https://github.com/intel/neural-compressor/commit/72fbce4b34f29c2b6fe0d41a76c4d65edb08719a), [eaa3a5](https://github.com/intel/neural-compressor/commit/eaa3a580c8a9f27268d3c27e551054dd5053f01c), [95e67e](https://github.com/intel/neural-compressor/commit/95e67eac624285d304487b654330d660b169cfb1), [855c10](https://github.com/intel/neural-compressor/commit/855c10ca37d01bd371a4b9dcd953ce735f9bdea6), [9c6102](https://github.com/intel/neural-compressor/commit/9c6102b351c45394357e0163470e0e997cb99d0e), [5dafe5](https://github.com/intel/neural-compressor/commit/5dafe5fd6584ca695f05b61c3dd84c2923c83cbd), [a5e5f5](https://github.com/intel/neural-compressor/commit/a5e5f5f64855b85e2a374c8b808b317448318113), [191383](https://github.com/intel/neural-compressor/commit/191383ebd95c1fbb77e626887ca6d808a454543c), [776645](https://github.com/intel/neural-compressor/commit/7766454d9a984257016ddad5d3a61de648f0bd35))
- [Quantization] Support Layer-wise Quantization for RTN/GPTQ on framework extension API for PyTorch ([649e6b](https://github.com/intel/neural-compressor/commit/649e6b148755bda737009bc323b735b92231c579))
- [Quantization] Support Post Training Quantization on framework extension API for Tensorflow ([6c27c1](https://github.com/intel/neural-compressor/commit/6c27c19c3ec7a318455bd12d6e66ad9bb757ab93), [e22c61](https://github.com/intel/neural-compressor/commit/e22c61ede2942f7f1ba1cf9e480491371184bb32), [f21afb](https://github.com/intel/neural-compressor/commit/f21afbbdd18cd61627fc02e5b22ca242402bcfbf), [3882e9](https://github.com/intel/neural-compressor/commit/3882e9cc4b356a081843455f3244d7f0e013f888), [2627d3](https://github.com/intel/neural-compressor/commit/2627d33b9ff900697184972575969ecc55da8923))
- [Quantization] Support Post Training Quantization on Keras3 ([f67e86](https://github.com/intel/neural-compressor/commit/f67e8613c409563f016c77e05a1acb969790cfc6), [047560](https://github.com/intel/neural-compressor/commit/047560fcf6a2e5812d33e579e047a3c8767e4a9a))
- [Quantization] Support Weight-only Quantization on Gaudi2 ([4b9b44](https://github.com/intel/neural-compressor/commit/4b9b447aa0872a8edc26fd59a349c195cf208a97), [14868c](https://github.com/intel/neural-compressor/commit/14868c0900a1f91fe39f138c67156ad66c16b20f), [0a3d4b](https://github.com/intel/neural-compressor/commit/0a3d4bd43f69c29e2f8a3b07ac13036e41c6579c))
- [Quantization] Improve performance and usability of quantization procedure on client side ([16a7b1](https://github.com/intel/neural-compressor/commit/16a7b11508c008d4d4180a0fe0e31c75b8e5d662))
- [Quantization] Support auto-device detection on framework extension API for PyTorch ([368ba5](https://github.com/intel/neural-compressor/commit/368ba5293ab2936c685d67db6f8423a27a62f7e1), [4b9b44](https://github.com/intel/neural-compressor/commit/4b9b447aa0872a8edc26fd59a349c195cf208a97), [e81a2d](https://github.com/intel/neural-compressor/commit/e81a2dd901dd1b93291555722c6d96901940be06), [0a3d4b](https://github.com/intel/neural-compressor/commit/0a3d4bd43f69c29e2f8a3b07ac13036e41c6579c), [534300](https://github.com/intel/neural-compressor/commit/53430092e7f9b46ed78afccb4b9610c9032bf57f), [2a86ae](https://github.com/intel/neural-compressor/commit/2a86aeafc754ca3b7495138381efb8faa9397fdf))
- [Quantization] Support Microscaling(MX) Quant for PyTorch ([4a24a6](https://github.com/intel/neural-compressor/commit/4a24a6a39218a3d186900a72a7e2e96ad539f4f4), [455f1e](https://github.com/intel/neural-compressor/commit/455f1e1f0f0284e87b46d257b6d126ca76fe1748))
- [Quantization] Enable cross-devices Half-Quadratic Quantization(HQQ) for LLMs support ([db6164](https://github.com/intel/neural-compressor/commit/db6164a25da5bf8ef8a7ba082a25d7bb4565b656), [07f940](https://github.com/intel/neural-compressor/commit/07f940c7f00ab0a5f6b3d7d9cb6b934e69e44a98))
- [Quantization] Support FP8 cast Weight-only Quantization ([57ed61](https://github.com/intel/neural-compressor/commit/57ed6138453246141a2128b600588df0b4d5d440))
- [Mixed-Precision] Support FP16 mixed-precision on framework extension autotune API for PyTorch ([2e1cdc](https://github.com/intel/neural-compressor/commit/2e1cdc5be61458be186d0e6f2035b4287b223cf3))
- [Mixed-Precision] Support mixed `INT8` with `FP16` in PT2E path ([fa961e](https://github.com/intel/neural-compressor/commit/fa961e1d0bbe371182d6da6d210d0b6a7693cce2))
- [AutoTune] Support accuracy-aware tuning on framework extension API ([e97659](https://github.com/intel/neural-compressor/commit/e9765955f991e1270e3b65635285f6b6cb8fc38c), [7b8aec](https://github.com/intel/neural-compressor/commit/7b8aec00d0c09bd499076457b68903229e09b803), [5a0374](https://github.com/intel/neural-compressor/commit/5a0374e7db23cac209af78f1ace9b38d23bebbb0), [a4675c](https://github.com/intel/neural-compressor/commit/a4675c7490f66ab2c75912dd69f1d79368f69858), [3a254e](https://github.com/intel/neural-compressor/commit/3a254e99c0a361c0179b4176a256c69e46681352), [ac47d9](https://github.com/intel/neural-compressor/commit/ac47d9b97b597f809ab56f9f6cb1a86951e2e334), [b8d98e](https://github.com/intel/neural-compressor/commit/b8d98ebaddcf1c7ece1def04ba4d55b7e92593ee), [fb6142](https://github.com/intel/neural-compressor/commit/fb61428228bcdf9a18b02e5963c4df7a60c9a54b), [fa8e66](https://github.com/intel/neural-compressor/commit/fa8e66a1d95b52c8ebdea21f2dc60db0fdfedd6a), [d22df5](https://github.com/intel/neural-compressor/commit/d22df5364ba6d7c98fea8545a9a9e49e2ce5ebb0), [09eb5d](https://github.com/intel/neural-compressor/commit/09eb5ddd3c0eb2dae198837cbae76ca5bb4e90c8), [c6a8fa](https://github.com/intel/neural-compressor/commit/c6a8fa1606a4aea34e62af0f106ab05cdccacab6))
- [Benchmarking] Implement `incbench` command for ease-of-use benchmark ([2fc725](https://github.com/intel/neural-compressor/commit/2fc72555c987dc7bce8476b389720e1a29159a43))
**Improvements**
- [Quantization] Integrate AutoRound v0.3 ([bfa27e](https://github.com/intel/neural-compressor/commit/bfa27e422dc4760f6a9b1783eee7dae10fe5324f), [fd9685](https://github.com/intel/neural-compressor/commit/fd96851f7f8339ec8bfabd602cf494ac6c31d17b))
- [Quantization] Support auto_host2device on RTN and GPTQ([f75ff4](https://github.com/intel/neural-compressor/commit/f75ff4082bc7a22d9367d3e91a3ea2c7aaec2bd2))
- [Quantization] Support `transformers.Conv1D` WOQ quantization ([b6237c](https://github.com/intel/neural-compressor/commit/b6237cf4d4c8e86fe373cf48ffe5a6588ef537ca))
- [Quantization] support quant_lm_head argument in all WOQ configs ([4ae2e8](https://github.com/intel/neural-compressor/commit/4ae2e87d2f98eb34c2e523a76ffa6ff77bf767e1))
- [Quantization] Update fp4_e2m1 mapping list to fit neural_speed and qbits inference ([5fde50](https://github.com/intel/neural-compressor/commit/5fde50f2c0476dbc08d59481b742515f5a210de1))
- [Quantization] Enhance load_empty_model import ([29471d](https://github.com/intel/neural-compressor/commit/29471df05a9e2c36c4ad8083c0b0b285011748d8))
- [Common] Add common logger to the quantization process ([1cb844](https://github.com/intel/neural-compressor/commit/1cb844b3c0b581f670fef16aa87fef2a85e6122b), [482f87](https://github.com/intel/neural-compressor/commit/482f87c6161581f9f8ff09804b6c430553cf59a9), [83bc77](https://github.com/intel/neural-compressor/commit/83bc779a4e97d8886383025d324d8379f70cc8b7), [f50baf](https://github.com/intel/neural-compressor/commit/f50baf2e9107e29d96e267fe115dc488f96db6f0))
- [Common] Enhance the `set_local` for operator type ([a58638](https://github.com/intel/neural-compressor/commit/a58638c1298fdff808742d1625196153d24f5c9c))
- [Common] Port more helper classes from 2.x ([3b150d](https://github.com/intel/neural-compressor/commit/3b150d61313ca6ca19bc38ec9f608900b8355519))
- [Common] Refine base config for 3.x API ([be42d0](https://github.com/intel/neural-compressor/commit/be42d033b25c6dd3bcac0ead964699f25f939014), [efea08](https://github.com/intel/neural-compressor/commit/efea089e27613690c32d6f1745731a28ca90bf65))
- [Export] Migrate export feature to 2.x and 3.x from deprecated ([794b27](https://github.com/intel/neural-compressor/commit/794b2762c0bb2f076973e1fca5fdecd23efec774))
**Examples**
- Add save/load for PT2E example ([0e724a](https://github.com/intel/neural-compressor/commit/0e724a4d96ca0d6a170281688ca644b37fa340e0))
- Add IPEX XPU example for framework extension API ([6e1b1d](https://github.com/intel/neural-compressor/commit/6e1b1da712d20d9291e5932974bc3167b00dd214))
- Enable TensorFlow yolov5 example for framework extension API ([19024b](https://github.com/intel/neural-compressor/commit/19024b351372ca76934db33b0d230552c13bff39))
- Update example for framework extension IPEX SmoothQuant ([b35ff8](https://github.com/intel/neural-compressor/commit/b35ff8f0044bdf12da87647d0404b62ae5ff7d3d))
- Add SDXL model example for framework extension API ([000946](https://github.com/intel/neural-compressor/commit/000946fce147a02ad6662538e337570c0a56329d))
- Add PyTorch mixed precision example ([e106de](https://github.com/intel/neural-compressor/commit/e106dea73471ddecdb1cfc702e90fcb1a5d41452), [9077b3](https://github.com/intel/neural-compressor/commit/9077b382259e2e56ff5796084a1f4275e4387537))
- Add CV and LLM examples for PT2E quantization path ([b401b0](https://github.com/intel/neural-compressor/commit/b401b02db2cc7d7f4f8412a815fa435e66e330a0))
- Add Recommendation System examples for IPEX path ([e470f6](https://github.com/intel/neural-compressor/commit/e470f6cdfbbad32fcf17be56903e649a05059780))
- Add TensorFlow examples for framework extension API ([fb8577](https://github.com/intel/neural-compressor/commit/fb8577931c11c3bdc55868e01576b73372d9912b), [922b24](https://github.com/intel/neural-compressor/commit/922b2471e617cc4c56376866e991302d0beb0640))
- Add PyTorch Microscaling(MX) Quant examples ([6733da](https://github.com/intel/neural-compressor/commit/6733dabc4d48a6625e184e4a29a754949f415097))
- Add PyTorch SmoothQuant LLM examples for new framework extension API ([137fa3](https://github.com/intel/neural-compressor/commit/137fa3add2d8a0688dd0e76bd15e347b588d56a8))
- Add PyTorch GPTQ/RTN example for framework extension API ([813d93](https://github.com/intel/neural-compressor/commit/813d93051ab16b6bbac11bdf5986929330876e30))
- Add double quant example ([ccd0c9](https://github.com/intel/neural-compressor/commit/ccd0c9e6c112d84979504177b9390270b3d71b69))
**Bug Fixes**
- Fix ITREX qbits nf4/int8 training core dumped issue ([190e6b](https://github.com/intel/neural-compressor/commit/190e6b2be6b31158a1101729bcf621bc93e85531))
- Fix unused pkgs import ([437c8e](https://github.com/intel/neural-compressor/commit/437c8e75706cff1767dcde115e428654766b3f18))
- Remove Gelu Fusion for TensorFlow New API ([5592ac](https://github.com/intel/neural-compressor/commit/5592acc60562b7fccb308af0eaaba9cad53004a5))
- Fix GPTQ layer match issue ([90fb43](https://github.com/intel/neural-compressor/commit/90fb43135397a035968b5334eba21931c18a83c0))
- Fix static quant regression issue on IPEX path ([70a1d5](https://github.com/intel/neural-compressor/commit/70a1d501fdfee16a10e34385bca9f15eba4366b4))
- Fix config expansion with empty options ([6b2738](https://github.com/intel/neural-compressor/commit/6b2738390dfdab543de1ccd9242fe541c78b6a2e))
- Fix act_observer for IPEX SmoothQuant and static quantization ([263450](https://github.com/intel/neural-compressor/commit/2634501690f2396865011c2f79c0b8adba36cb07))
- Set automatic return_dict=False for GraphTrace ([53e7df](https://github.com/intel/neural-compressor/commit/53e7dfe57ef4ad1754f37343b3ad3850b64ae4f4))
- Fix WOQ Linear pack slow issue ([da1ada](https://github.com/intel/neural-compressor/commit/da1ada236eb867b69c663c58904e0a21ad9bcb88), [daa143](https://github.com/intel/neural-compressor/commit/daa1431b200f92ab9684a2c78e15602cb23d7c07))
- Fix dtype of unpacked tensor ([29fdec](https://github.com/intel/neural-compressor/commit/29fdecbbb44ceb8d19c12809af90dc23063becfc))
- Fix WeightOnlyLinear bits type when dtype="intx" ([19ff13](https://github.com/intel/neural-compressor/commit/19ff13e8a8963744349e46013ef522fcb3e8c3d8))
- Fix several issues for SmoothQuant and static quantization ([7120dd](https://github.com/intel/neural-compressor/commit/7120dd4909599b228692415732688b3d5e77206d))
- Fix IPEX examples failed with evaluate ([e82674](https://github.com/intel/neural-compressor/commit/e82674a75de564a632cea639db25fbe41fec100a))
- Fix HQQ issue for group size of -1 ([8dac9f](https://github.com/intel/neural-compressor/commit/8dac9f2c3d3f8411f27a2e327f3dbbc7c8de0829))
- Fix bug in GPTQ g_idx ([4f893c](https://github.com/intel/neural-compressor/commit/4f893ca9e4c44d12ea028e00a4881b5154ee54a8))
- Fix tune_cfg issue for static quant ([ba1650](https://github.com/intel/neural-compressor/commit/ba165047dbcf4671cf20e9c1d031577dade94348))
- Add non-str `op_name` match workaround for IPEX ([911ccd](https://github.com/intel/neural-compressor/commit/911ccd3a94b124e2287780a2ca219eaa01dc21d9))
- Fix opt GPTQ double quant example config ([62aa85](https://github.com/intel/neural-compressor/commit/62aa85df23ce3f5db353ce9a4bfb8cd88395c376))
- Fix GPTQ accuracy issue in framework extension API example ([c701ea](https://github.com/intel/neural-compressor/commit/c701eaff7d69c46a57172b0547bfe2fc05164a0c))
- Fix bf16 symbolic_trace bug ([3fe2fd](https://github.com/intel/neural-compressor/commit/3fe2fd9aadda4991552d65fef09a75ba5127b5db))
- Fix `opt_125m_woq_gptq_int4_dq_ggml` issue ([b99aba](https://github.com/intel/neural-compressor/commit/b99abae5d937380cf9df80c9050fce18bddfb72d))
**Documentations**
- Update new API installation document ([50eb6f](https://github.com/intel/neural-compressor/commit/50eb6fb6f5924054b38d8ed99e78e0ebdab51f50), [ff3740](https://github.com/intel/neural-compressor/commit/ff3740146a829e845d79266acf233b202843d3fd))
- Add new architecture diagram ([d56075](https://github.com/intel/neural-compressor/commit/d56075c7e9f6e3e85385abbff9f1b0d07d157a04), [2c3556](https://github.com/intel/neural-compressor/commit/2c3556d441de2f0963167db71ecdee7353bd76bb))
- Add new workflow diagram ([96538c](https://github.com/intel/neural-compressor/commit/96538c56fea8a42c3e487b4682c346e4832e3e97))
- Update documentation for framework extension API ([0a5423](https://github.com/intel/neural-compressor/commit/0a542397ac1ea8d6fe2edf04565d3cb673001b2c))
- Add documents for framework extension API for PyTorch ([ecffc2](https://github.com/intel/neural-compressor/commit/ecffc2eb29ada100d2b60574258d8a1b6548e449))
- Add documents for framework extension API for TensorFlow ([4dbf71](https://github.com/intel/neural-compressor/commit/4dbf71e412a370f09809db89db27a0b7c7b56d14))
- Add documents for autotune API ([853dc7](https://github.com/intel/neural-compressor/commit/853dc71eee292e93e38f91683ec8229eb14c25da), [de3e94](https://github.com/intel/neural-compressor/commit/de3e94f6d15f74bb3081366dd1c045d006adfa00))
- Update for API 3.0 online doc ([81a076](https://github.com/intel/neural-compressor/commit/81a076d7c59609be666ddddf64a574cacf1a5c36), [87f02c](https://github.com/intel/neural-compressor/commit/87f02c15a2f1047a8b4bcb5b7f443a4cecb4dfc7), [efcb29](https://github.com/intel/neural-compressor/commit/efcb2930be6b9d575b1fb8a6e86afdd6a09b5857))
- Add docstring for API modules ([aa42e5](https://github.com/intel/neural-compressor/commit/aa42e5edcd0b5196a21ee7bb68a7965125601fea), [5767ae](https://github.com/intel/neural-compressor/commit/5767aed4dbc9a400f65f74bdc9c09209f0a4c145), [296c5d](https://github.com/intel/neural-compressor/commit/296c5d4f1138e5bf33584fb75cea0f6ca5080122), [1ebf69](https://github.com/intel/neural-compressor/commit/1ebf6987bd054b926d3cdd5630ae058c8d3a66c2), [0c52e1](https://github.com/intel/neural-compressor/commit/0c52e1243b78734e95fc348834303bc3c3cfe369), [b78794](https://github.com/intel/neural-compressor/commit/b787940ea2868e1fc8a56a81b94d62d4ea3d8454), [6b3020](https://github.com/intel/neural-compressor/commit/6b30207d0a3b6d6d497ecf8f6bb5891765d798ba), [28578b](https://github.com/intel/neural-compressor/commit/28578b96bf6217fa2b79699838e5a4af30843de4))
- Add doc for client usage ([d254d5](https://github.com/intel/neural-compressor/commit/d254d508be9c6b14c474fd643ad448a4e261ca72), [305838](https://github.com/intel/neural-compressor/commit/30583882df76838ea3e4a719e25ddca7bb449b9b))
- Remove 1.x API documents ([705672](https://github.com/intel/neural-compressor/commit/7056720df96f17c706522bc6b0530df534d22ee7), [d32046](https://github.com/intel/neural-compressor/commit/d3204604aad007f3db67c46dcb0575aa8f5cd584))
- Add readme for framework extension API examples ([385da7](https://github.com/intel/neural-compressor/commit/385da7c7ed018a66fcba6e28658d1a5eea2e52e4))
- Add version mapping between INC and Gaudi SW Stack ([acd8f4](https://github.com/intel/neural-compressor/commit/acd8f4f182eaccf03b221f765ec0ddb451be3415))
**Validated Configurations**
- Centos 8.4 & Ubuntu 22.04 & Win 11 & MacOS Ventura 13.5
- Python 3.8, 3.9, 3.10, 3.11
- PyTorch/IPEX 2.1, 2.2, 2.3
- TensorFlow 2.14, 2.15, 2.16
- ONNX Runtime 1.16, 1.17, 1.18