Neural-compressor

Latest version: v3.1.1

Safety actively analyzes 688365 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 7

2.4.1

Not secure
- Improvement
- Bug Fixes
- Examples
- Validated Configurations

**Improvement**
- Narrow down the tuning space of SmoothQuant auto-tune ([9600e1](https://github.com/intel/neural-compressor/commit/9600e1d973bf17e962e55fb0f3817db76ce8fea9))
- Support ONNXRT Weight-Only Quantization with different dtypes ([5119fc](https://github.com/intel/neural-compressor/commit/5119fcb091c3c26ffc3795781986f250d072408d))
- Add progress bar for ONNXRT Weight-Only Quantization and SmoothQuant ([4d26e3](https://github.com/intel/neural-compressor/commit/4d26e3aa94a2e1bd7481abdfe9226d31045b33db))

**Bug Fixes**
- Fix SmoothQuant alpha-space generation ([33ece9](https://github.com/intel/neural-compressor/commit/33ece90016bce790ad00729b885b15b1b496f4d7))
- Fix inputs error for SmoothQuant example_inputs ([39f63a](https://github.com/intel/neural-compressor/commit/39f63a8d4b92b434bc9d964a17c112bd1ffc7115))
- Fix LLMs accuracy regression with IPEX 2.1.100 ([3cb6d3](https://github.com/intel/neural-compressor/commit/3cb6d38f3e5a74c5657b0614c012c207dae4d5b1))
- Fix quantizable add ops detection on IPEX backend ([4c004d](https://github.com/intel/neural-compressor/commit/4c004d777d28a96fcea7d80d3e26c06121acc476))
- Fix range step bug in ORTSmoothQuant ([40275c](https://github.com/intel/neural-compressor/commit/40275cbf95d2609d582d9c6bee7929a41dc7cb59))
- Fix unit test bugs and update CI versions ([6c78df](https://github.com/intel/neural-compressor/commit/6c78dfefccf03c246f8f2ac47777c90c7aa58cbd), [835805](https://github.com/intel/neural-compressor/commit/835805cfb89022822b9c3753f7b2f160da7ee4f0))
- Fix notebook issues ([08221e](https://github.com/intel/neural-compressor/commit/08221e1cc12f22da1e9749e33c123a72dc76a44d))


**Examples**
- Add verified LLMs list and recipes for SmoothQuant and Weight-Only Quantization ([f19cc9](https://github.com/intel/neural-compressor/commit/f19cc9db4f7727ff68cd250841d5442ec1cfcd7d))
- Add code-generaion evaluation for Weight-Only Quantization GPTQ ([763440](https://github.com/intel/neural-compressor/commit/763440912bf872195993b92588e5134c0d367dce))


**Validated Configurations**
- Centos 8.4 & Ubuntu 22.04
- Python 3.10
- TensorFlow 2.14
- ITEX 2.14.0.1
- PyTorch/IPEX 2.1.0
- ONNX Runtime 1.16.3

2.4

Not secure
- Highlights
- Features
- Improvement
- Productivity
- Bug Fixes
- Examples
- Validated Configurations

**Highlights**
- Supported [layer-wise](https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_layer_wise.md) quantization for PyTorch RTN/GPTQ Weight-Only Quantization and ONNX Runtime W8A8 quantization.
- Supported Weight-Only Quantization tuning for ONNX Runtime backend.
- Supported GGML double quant on RTN/GPTQ Weight-Only Quantization with FW extension API
- Supported SmoothQuant of Big Saved Model for TensorFlow Backend.

**Features**
- [Quantization] Support GGML double quant in Weight-Only Quantization for RTN and GPTQ ([05c15a](https://github.com/intel/neural-compressor/commit/05c15a4a66699a32ba30a47efbc457b5abb40983))
- [Quantization] Support Weight-Only Quantization tuning for ONNX Runtime backend ([6d4ea5](https://github.com/intel/neural-compressor/commit/6d4ea5b114d7af4030626702da7b515a3f7771a9), [934ba0](https://github.com/intel/neural-compressor/commit/934ba032386d73ca85e03b891ae05757fe4faac0), [4fcfdf](https://github.com/intel/neural-compressor/commit/4fcfdf8f2960b1e5409d3e5c835d166ec127089b))
- [Quantization] Support SmoothQuant block-wise alpha-tuning ([ee6bc2](https://github.com/intel/neural-compressor/commit/ee6bc284b1e65c37b2f46857da9cd315019b208a))
- [Quantization] Support SmoothQuant of Big Saved Model for TensorFlow Backend ([3b2925](https://github.com/intel/neural-compressor/commit/3b2925263845ea6e38df5b50912fbfd4bd6b85ee), [4f2c35](https://github.com/intel/neural-compressor/commit/4f2c35d1ffda4384245339ddf30f3a3dc0ed1772))
- [Quantization] Support PyTorch layer-wise quantization for GPTQ ([ee5450](https://github.com/intel/neural-compressor/commit/ee5450707bf16cceff9de8048a8fdddf968ca936))
- [Quantization] support PyTorch layer-wise quantization for RTN ([ebd1e2](https://github.com/intel/neural-compressor/commit/ebd1e240d006566e373f2b4854d6f3157ae33c36))
- [Quantization] Support ONNX Runtime layer-wise W8A8 quantization ([6142e4](https://github.com/intel/neural-compressor/commit/6142e48dda5c5023a69df56008a73aad54cfea58), [5d33a5](https://github.com/intel/neural-compressor/commit/5d33a536440ec69e56da2ee3834ed21d6c077862))
- [Common] [Experimental] FW extension API implement ([76b8b3](https://github.com/intel/neural-compressor/commit/76b8b3f718e0533b2be01c87987ba807da2a108a), [8447d7](https://github.com/intel/neural-compressor/commit/8447d7097fa33231b8a6e4a9e26e526d191787de), [258236](https://github.com/intel/neural-compressor/commit/2582361aee60be11517e7b2985293e2c8e75d004))
- [Quantization] [Experimental] FW extension API for PT backend support Weight-Only Quantization ([915018](https://github.com/intel/neural-compressor/commit/9150181bb2ab71201fbdb052fbcaa2aba18a090a), [dc9328](https://github.com/intel/neural-compressor/commit/dc9328c09b243d7df3bccc0a35a8a12feaabb40a))
- [Quantization] [Experimental] FW extension API for TF backend support Keras Quantization ([2627d3](https://github.com/intel/neural-compressor/commit/2627d33b9ff900697184972575969ecc55da8923))
- [Quantization] IPEX 2.1 XPU (CPU+GPU) support ([af0b50](https://github.com/intel/neural-compressor/commit/af0b50fd201c672c141c2c0508cdd4e4ae62684f), [cf847c](https://github.com/intel/neural-compressor/commit/cf847c1abf8f09ce4ae6e21f62aeb5ec7fa2f358))


**Improvement**
- [Quantization] Add use_optimum_format for export_compressed_model in Weight-Only Quantization ([5179da](https://github.com/intel/neural-compressor/commit/5179da10dd6540660f320206723e2651b4d6a292), [0a0644](https://github.com/intel/neural-compressor/commit/0a06448384699aa1e51bd2cd986c226382b9cd94))
- [Quantization] Enhance ONNX Runtime quantization with DirectML EP ([db0fef](https://github.com/intel/neural-compressor/commit/db0fef72137e0ec6a5ffbdc0e470e84ea4cefeb1), [d13183](https://github.com/intel/neural-compressor/commit/d1318320632dea889a3eb5287dea07ca320673aa), [098401](https://github.com/intel/neural-compressor/commit/098401d029e33741cfe42011544bf10e64323e46), [6cad50](https://github.com/intel/neural-compressor/commit/6cad5008cc4e1880ba5d9e01538e4a9dd3be962e))
- [Quantization] Support restore ipex model from json ([c3214c](https://github.com/intel/neural-compressor/commit/c3214c9ec007301896c8a5df587be81fb047b541))
- [Quantization] ONNX Runtime add attr to MatMulNBits ([7057e3](https://github.com/intel/neural-compressor/commit/7057e3b2a3ae1f9911516fb173287e1832f77afd))
- [Quantization] Increase SmoothQuant auto alpha running speed ([173c18](https://github.com/intel/neural-compressor/commit/173c1881ae4b4a04868b66a2253fc44807d63349))
- [Quantization] Add SmoothQuant alpha search space as a config argument ([f9663d](https://github.com/intel/neural-compressor/commit/f9663d0e03867ba28af31b332bc42f614c994fd9))
- [Quantization] Add SmoothQuant weight_clipping as a default_on option ([1f4aec](https://github.com/intel/neural-compressor/commit/1f4aec3c7149d93568106326e98a91933826475e))
- [Quantization] Support SmoothQuant with MinMaxObserver ([45b496](https://github.com/intel/neural-compressor/commit/45b4966d3f4198d0fe799488a4ec4c9b844bc07d))
- [Quantization] Support Weight-Only Quantization with fp16 for PyTorch backend ([d5cb56](https://github.com/intel/neural-compressor/commit/d5cb567f0a86ecec3ef561ee69fc44c0cb6cb248))
- [Quantization] Support trace with dictionary type example_inputs ([afe315](https://github.com/intel/neural-compressor/commit/afe3159c4872db654465602c4b237d28cbbf7610))
- [Quantization] Support falcon Weight-Only Quantization ([595d3a](https://github.com/intel/neural-compressor/commit/595d3a1987c77542cd8b904dd16dc1c95ba9bdc7))
- [Common] Add deprecation decorator in experimental fold ([aeb3ed](https://github.com/intel/neural-compressor/commit/aeb3ed292b30b941bd69ef40dd99b0031b2171e0))
- [Common] Remove 1.x API dependency ([ee617a](https://github.com/intel/neural-compressor/commit/ee617a44aaef3d2c6860d850a384db3c4abe80fa))
- [Mixed Precision] Support PyTorch eager mode BF16 MixedPrecision ([3bfb76](https://github.com/intel/neural-compressor/commit/3bfb76dca9271c48940ca4af26cabfa48f50ca56))


**Productivity**
- Support quantization and benchmark on macOS ([16d6a0](https://github.com/intel/neural-compressor/commit/16d6a06c212fbc8908d443b16a354fdfd6337629))
- Support ONNX Runtime 1.16.0 ([d81732](https://github.com/intel/neural-compressor/commit/d817328dd9040f3896bed71c1bb1c181a3cc805c), [299af9](https://github.com/intel/neural-compressor/commit/299af9a162d1f7c4f2e6d4eaceab69163121bee8), [753783](https://github.com/intel/neural-compressor/commit/753783011d2f0fc71abe331e589aff94f63aeac3))
- Support TensorFlow new API for gnr-base ([8160c7](https://github.com/intel/neural-compressor/commit/8160c71b845d7c76f6ff2d66883c871450f57687))


**Bug Fixes**
- Fix GraphModule object has no attribute bias ([7f53d1](https://github.com/intel/neural-compressor/commit/7f53d16914926cec9cf8e02be1ed3dcffd2ca730))
- Fix ONNX model export issue ([af0aea](https://github.com/intel/neural-compressor/commit/af0aea4b221c188e739fde205e22769f7532ffc1), [eaa57f](https://github.com/intel/neural-compressor/commit/eaa57f69ec0de8e821fe6e3297836b08a3aee3f1))
- Add clip for ONNX Runtime SmoothQuant ([cbb69b](https://github.com/intel/neural-compressor/commit/cbb69bf9c244d371833db1092aec9a66b304b7a4))
- Fix SmoothQuant minmax observer init ([b1db1c](https://github.com/intel/neural-compressor/commit/b1db1cf3d102a1a6234f82493f27ac3362c0e150))
- Fix SmoothQuant issue in get/set_module ([dffcfe](https://github.com/intel/neural-compressor/commit/dffcfe1e581a5bcfdd45c24147e934c180131e9d))
- Align sparsity with block-wise masks in progressive pruning ([fcdc29](https://github.com/intel/neural-compressor/commit/fcdc29ad8105ec3b6fc305dedc2edd6fc9c6cd87))


**Examples**
- Support peft model with SmoothQuant ([5e21b7](https://github.com/intel/neural-compressor/commit/5e21b7057b56a46f9299ee35c18ebf47082d1175))
- Enable two ONNX Runtime examples table-transformer-detection ([550cee](https://github.com/intel/neural-compressor/commit/550cee2f1538db7bb542e6c24c3d599fc1d8ef4d)), BEiT ([7265df](https://github.com/intel/neural-compressor/commit/7265dfee1050979ed6672b7efb9cfd4f9bd86acf))


**Validated Configurations**
- Centos 8.4 & Ubuntu 22.04 & Win10 & MacOS Ventura 13.5
- Python 3.8, 3.9, 3.10, 3.11
- TensorFlow 2.13, 2.14, 2.15
- ITEX 1.2.0, 2.13.0.0, 2.14.0.1
- PyTorch/IPEX 1.13.0+cpu, 2.0.1+cpu, 2.1.0
- ONNX Runtime 1.14.1, 1.15.1, 1.16.3
- MXNet 1.9.1

2.3.2

Not secure
- Features
- Bug Fixes

**Features**
- Reduce memory consumption in ONNXRT adaptor ([f64833](https://github.com/intel/neural-compressor/commit/f64833d9912e74e734e17dd894baca5cde7a8481))
- Support MatMulFpQ4 for onnxruntime 1.16 ([1beb43](https://github.com/intel/neural-compressor/commit/1beb435c7f46a0292dbf0d49ad3882c4770c6200))
- Support MatMulNBits for onnxruntime 1.17 ([67a31b](https://github.com/intel/neural-compressor/commit/67a31ba897cca3d1769b074ecbdebc2fc9389428))

**Bug Fixes**
- Update ITREX version in ONNXRT WOQ example and fix bugs in hf models ([0ca51a](https://github.com/intel/neural-compressor/commit/0ca51a153dac26a794ebe3f03eb7fb7cd49c251e))
- Update ONNXRT WOQ example into llama-2-7b ([7f2063](https://github.com/intel/neural-compressor/commit/7f2063f79f190e1fdfe3edcae0504ef9025f2c7c))
- Fix ONNXRT WOQ failed with None model_path ([cbd0a4](https://github.com/intel/neural-compressor/commit/cbd0a416f46229b539005a359eabbbd3230518f6))

**Validated Configurations**
- Centos 8.4 & Ubuntu 22.04
- Python 3.10
- TensorFlow 2.13
- ITEX 2.13
- PyTorch/IPEX 2.0.1+cpu
- ONNX Runtime 1.15.1
- MXNet 1.9.1

2.3.1

Not secure
- Bug Fixes
- Productivity

**Bug Fixes**
- Fix PyTorch SmoothQuant for auto alpha ([e9c14a](https://github.com/intel/neural-compressor/commit/e9c14a51b88b3cf73fcb0625cab708173615734f), [35def7](https://github.com/intel/neural-compressor/commit/35def7b49ae67a00218ebd89bf629456626cc590))
- Fix PyTorch SmoothQuant calibration memory overhead ([49e950](https://github.com/intel/neural-compressor/commit/49e950e889ec577b9191091bb575d804e2c022bb))
- Fix PyTorch SmoothQuant issue in get/set_module (Issue 1265)([6de9ce](https://github.com/intel/neural-compressor/commit/6de9ce3d30fd7477367dcfd6bdfb583a16779b74))
- Support falcon Weight-Only Quantization ([bf7b5c](https://github.com/intel/neural-compressor/commit/bf7b5cfb9ef0b5943ea38c41fd0c35f19c7a5fdb))
- Remove Conv2d in Weight-Only Quantization adaptor white list ([1a6526](https://github.com/intel/neural-compressor/commit/1a6526f24d3e00f932578a74fa4019a6ba90fb8b))
- Fix TensorFlow ssd_resnet50_v1 Example for TF New API ([c63fc5](https://github.com/intel/neural-compressor/commit/c63fc5f62b307548b82c4f5683aec415ba54ab12))

**Productivity**
- Adapt Example for TensorFlow 2.14 AutoTrackable API Change ([424cf3](https://github.com/intel/neural-compressor/commit/424cf3ab0418794aa8ade77d143a6951deef9055))

**Validated Configurations**
- Centos 8.4 & Ubuntu 22.04
- Python 3.10
- TensorFlow 2.13, 2.14
- ITEX 2.13
- PyTorch/IPEX 2.0.1+cpu
- ONNX Runtime 1.15.1
- MXNet 1.9.1

2.3

Not secure
- Highlights
- Features
- Improvement
- Productivity
- Bug Fixes
- Examples
- Validated Configurations


**Highlights**
- Integrate Intel Neural Compressor into MSFT ONNX Runtime ([16288](https://github.com/microsoft/onnxruntime/pull/16288)) and Olive ([#411](https://github.com/microsoft/Olive/pull/411), [#412](https://github.com/microsoft/Olive/pull/412), [#469](https://github.com/microsoft/Olive/pull/469)).
- Supported low precision (INT4, NF4, FP4) and [Weight-Only](https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_weight_only.md) Quantization algorithms including RTN, [AWQ](https://arxiv.org/abs/2306.00978), [GPTQ](https://arxiv.org/abs/2210.17323) and TEQ on ONNX Runtime and PyTorch for LLMs optimization.
- Supported sparseGPT pruner ([88adfc](https://github.com/intel/neural-compressor/commit/88adfc99f6b2edf0144c7344be9236b6e1030b54)).
- Supported quantization for ONNX Runtime DML EP and DNNL EP, and verified inference on Intel NPU (e.g., Meteor Lake) and Intel CPU (e.g., Sapphire Rapids).


**Features**
- [Quantization] Support ONNX Runtime quantization and inference for DNNL EP ([79be8b](https://github.com/intel/neural-compressor/commit/79be8b99c676f0b0b10ea56eff71868fc8696910))
- [Quantization] [Experimental] Support ONNX Runtime quantization and inference for DirectML EP ([750bb9](https://github.com/intel/neural-compressor/commit/750bb9bc28566d2c2189fde2e6edc8bf3ce3cbbb))
- [Quantization] Support low precision and Weight-Only Quantization (WOQ) algorithms, including RTN ([501440](https://github.com/intel/neural-compressor/commit/501440ab560056e2e3a1a75c922361ebf614fc04), [19ab16](https://github.com/intel/neural-compressor/commit/19ab16c1275aed3efea0267c384e203790f04c03), [859315](https://github.com/intel/neural-compressor/commit/85931587d6fb9fd10d16e5c750dc5fdc519bda73)), AWQ ([2562f2](https://github.com/intel/neural-compressor/commit/2562f29842e3eac4a28d11ca4502376375b893bf), [641d42](https://github.com/intel/neural-compressor/commit/641d42b2ebf873e87aa7d5bb0b2fcd518550022f)),
GPTQ ([b5ac3c](https://github.com/intel/neural-compressor/commit/b5ac3c4492c7f21ea0e6910eba11a637b67405f1), [6ba783](https://github.com/intel/neural-compressor/commit/6ba78372cc846ab961b73b9b7007ec41e75341e8)) and TEQ ([d2f995](https://github.com/intel/neural-compressor/commit/d2f995bf00bf808eb318887e8bbbea6e0529740e), [9ff7f0](https://github.com/intel/neural-compressor/commit/9ff7f01c3ca9f5aba0aff01260d58ce3007a8f4c)) for PyTorch
- [Quantization] Support NF4 and FP4 data type for PyTorch Weight-Only Quantization ([3d11b5](https://github.com/intel/neural-compressor/commit/3d11b5e78d7bddcdee56f354de9d1f78a3da2033))
- [Quantization] Support low precision and Weight-Only Quantization algorithms, including RTN, AWQ and GPTQ for ONNX Runtime ([da4c92](https://github.com/intel/neural-compressor/commit/da4c92cdcc1a16df2643a87ab35b49b277c2fb5b))
- [Quantization] Support layer-wise quantization ([d9d1fc](https://github.com/intel/neural-compressor/commit/d9d1fccf67ce32e545bc9986936edebce01c500a)) and enable with SmoothQuant ([ec9ae9](https://github.com/intel/neural-compressor/commit/ec9ae913abfffd138dec55d3915fde52a96f6445))
- [Pruning] Add sparseGPT pruner and refactor pruning class ([88adfc](https://github.com/intel/neural-compressor/commit/88adfc99f6b2edf0144c7344be9236b6e1030b54))
- [Pruning] Add Hyper-parameter Optimization algorithm for pruning ([6613cf](https://github.com/intel/neural-compressor/commit/6613cfa9c7b8a06b3b85f35e2cf3ba2663766fd3))
- [Model Export] Support PT2ONNX dynamic quantization export ([165532](https://github.com/intel/neural-compressor/commit/16553260b23fbe237dc66726d9e3a1637a6e0cb1))


**Improvement**
- [Common] Clean up dataloader usage in examples ([1044d8](https://github.com/intel/neural-compressor/commit/1044d8d4b722315cc62e9b4b80573e4cd7706465),
[a2931e](https://github.com/intel/neural-compressor/commit/a2931eaa4052eec195be3c79a13f7bfa23e54473), [447cc7](https://github.com/intel/neural-compressor/commit/447cc7f2a70b15943c87494662aff32c740b62c8))
- [Common] Enhance ONNX Runtime backend check ([4ce9de](https://github.com/intel/neural-compressor/commit/4ce9de5feb472dbab57a3bb9369c8b7ba1c57305))
- [Strategy] Add block-wise distributed fallback in basic strategy ([ea309f](https://github.com/intel/neural-compressor/commit/ea309f51925be25d3cc0ecfb32922789e3b645cb))
- [Strategy] Enhance strategy exit policy ([d19b42](https://github.com/intel/neural-compressor/commit/d19b42f9193f455990a9b4bfdd47d2795e04b154))
- [Quantization] Add WeightOnlyLinear for Weight-Only approach to allow low memory inference ([00bbf8](https://github.com/intel/neural-compressor/commit/00bbf8413e863d1ac4b3ad3c35d95371c9bba023))
- [Quantization] Support more ONNX Runtime direct INT8 ops ([b9ce61](https://github.com/intel/neural-compressor/commit/b9ce61a860cc793123575e549c2c174474e93ef9))
- [Quantization] Support TensorFlow per-channel MatMul quantization ([cf5589](https://github.com/intel/neural-compressor/commit/cf55895b8d5c6c6280fe70d437db93bf76cd76d0))
- [Quantization] Implement a new method to perform alpha auto-tuning in SmoothQuant ([084eda](https://github.com/intel/neural-compressor/commit/084edad14a0235c529dc04ce65cd044c32a61047))
- [Quantization] Enhance ONNX SmoothQuant tuning structure ([f0d51c](https://github.com/intel/neural-compressor/commit/f0d51c2cd35b94972a7db2caea2f2d0fd39dc61b))
- [Quantization] Enhance PyTorch SmoothQuant tuning structure ([81da40](https://github.com/intel/neural-compressor/commit/81da4039f47f671fc670df95482aa97caecf4afd))
- [Quantization] Update PyTorch examples dataloader to support transformers 4.31.x ([59371f](https://github.com/intel/neural-compressor/commit/59371feeea7f63bf60c4386f90bcf70569b69284))
- [Quantization] Enhance ONNX Runtime backend setting for GPU EP support ([295535](https://github.com/intel/neural-compressor/commit/295535ac8b0f957deda236f4b06e5565b43974fd))
- [Pruning] Refactor pruning ([92d14d](https://github.com/intel/neural-compressor/commit/92d14d7f8409451c0d8dfc4fc4ab1a0352de7248))
- [Mixed Precision] Update the list of supported layers for Keras mix-precision ([692c8b](https://github.com/intel/neural-compressor/commit/692c8bbc16fb7c4913ebb8ec699ce35757067b41))
- [Mixed Precision] Introduce quant_level into mixed precision ([0dc6a9](https://github.com/intel/neural-compressor/commit/0dc6a92f07b8cad14a0d1967095476a5db7815e3))


**Productivity**
- [Ecosystem] MSFT Olive integrate SmoothQuant and 3 LLM examples ([411](https://github.com/microsoft/Olive/pull/411), [#412](https://github.com/microsoft/Olive/pull/412), [#469](https://github.com/microsoft/Olive/pull/469))
- [Ecosystem] MSFT ONNX Runtime integrate SmoothQuant static quantization ([16288](https://github.com/microsoft/onnxruntime/pull/16288))
- [Neural Insights] Support PyTorch FX inspect tensor and integrate with Neural Insights ([775def](https://github.com/intel/neural-compressor/commit/775deff8e10187a793b902f2dbe248961824d8a0), [74a785](https://github.com/intel/neural-compressor/commit/74a785ef2ad3d494b452680c577959a934b2fcb0))
- [Neural Insights] Add step-by-step diagnosis cases ([99c3b0](https://github.com/intel/neural-compressor/commit/99c3b06b3a90a33434b4a387035459dfe0607e34))
- [Neural Solution] Resource management and user-facing API enhancement ([fbba10](https://github.com/intel/neural-compressor/commit/fbba10cf10d4ee8540e22d2e7ef0b70d4e6e0583))
- [Auto CI] Integrate auto CI code scan bug fix tools ([f77a2c](https://github.com/intel/neural-compressor/commit/f77a2c7606cdd2a0dec39c61d5ab95325272bcf2), [06cc38](https://github.com/intel/neural-compressor/commit/06cc3829eb1fa38db8404272999ee1cc11fa4dff))


**Bug Fixes**
- Fix bugs in PyTorch SmoothQuant ([0349b9](https://github.com/intel/neural-compressor/commit/0349b9ae2e0399900725eb9ec6f7013ae9df3eda), [8f3645](https://github.com/intel/neural-compressor/commit/8f3645289998b28f4206e9fb48c2f4f2123527c1))
- Fix pytorch dataloader batch size issue ([6a98d0](https://github.com/intel/neural-compressor/commit/6a98d0ba7bacd238782f85928d84b5d1ff720d12))
- Fix bugs for ONNX Runtime CUDA EP ([a1b566](https://github.com/intel/neural-compressor/commit/a1b566fb5607c3a8e508d0d24350a36f3c8c0b0a), [d1f315](https://github.com/intel/neural-compressor/commit/d1f315f359440382d713a0a20c7927c7c0d252a1))
- Fix bug in ONNX Runtime adapter where _rename_node function fails with model size > 2 GB ([1f6b1a](https://github.com/intel/neural-compressor/commit/1f6b1adc09a3fb5ae43cd0e721bc4430b636f596))
- Fix ONNX Runtime diagnosis bug ([f10e26](https://github.com/intel/neural-compressor/commit/f10e26390da84c4d3ef68c4f23c11c62b31cfa1a))
- Update Neural Solution example and fix grpc port issue ([528868](https://github.com/intel/neural-compressor/commit/5288684ba89fd50c325a72abdd4899c568b33dbd))
- Fix the objective initialization issue ([9d7546](https://github.com/intel/neural-compressor/commit/9d7546fd5dc2a4cced6238f940ddb1ad1a4f893f))
- Fix reshape issue for bayesian strategy ([77cb83](https://github.com/intel/neural-compressor/commit/77cb836060e83082ff71c1ac862e7b8aceda08e1))
- Fix CVEs ([d86922](https://github.com/intel/neural-compressor/commit/d869227695a544dfc8f26a1306c386c8858ffc16), [2bbfcd](https://github.com/intel/neural-compressor/commit/2bbfcd38ab4faba656847d7cc7df9b34d18c079d), [fc71fa](https://github.com/intel/neural-compressor/commit/fc71fac7dc6e51b2b259e35ed054d926e91a96fe))


**Examples**
- Add Weight-Only LLM examples for PyTorch ([4b24be](https://github.com/intel/neural-compressor/commit/4b24be1ec31bf9838c6052752f2530aa4814a630), [66f7c1](https://github.com/intel/neural-compressor/commit/66f7c10d566a6217395c2a6c34dea0c32d5a0ad3), [aa457a](https://github.com/intel/neural-compressor/commit/aa457a3f966a4f8dcdacd599834d0b05a38170bf))
- Add Weight-Only LLM examples for ONNX Runtime ([10c133](https://github.com/intel/neural-compressor/commit/10c133162e725c8d96f514a9b7e986730d594c02))
- Enable 3 ONNX Runtime examples, CodeBert ([5e584e](https://github.com/intel/neural-compressor/commit/5e584e6e74e039cfc269dfc1972e9f6a1687d41e)), LayoutLMv2 FUNSD ([5f0b17](https://github.com/intel/neural-compressor/commit/5f0b17e977b095bf6c2ba9e907b004025511369e)), Table Transformer ([eb8a95](https://github.com/intel/neural-compressor/commit/eb8a956dc056075d96a309831c177a63b81a76ee))
- Add ONNX Runtime LLM SmoothQuant example Llama-7B ([7fbcf5](https://github.com/intel/neural-compressor/commit/7fbcf54d9f331c0f4cb38767de1224e3bf3f0db9))
- Enable 2 TensorFlow examples, ViT ([94df99](https://github.com/intel/neural-compressor/commit/94df9977513ae10af3139d2462f5e8ff8ca4329c)), GraphSage ([29ec82](https://github.com/intel/neural-compressor/commit/29ec821fad39f5780d8b0f6be41460c98295e227))
- Add easy get started notebooks ([d7b608](https://github.com/intel/neural-compressor/commit/d7b608b341013e5669c2ab90a6ba59b663aa63a7), [6ee846](https://github.com/intel/neural-compressor/commit/6ee8466d17bc3a3fee9e7722d13e1e5e9e2d63cd))
- Add multi-cards magnitude pruning use case ([909618](https://github.com/intel/neural-compressor/commit/9096188ef18901ab56416601cd965b974800545c))
- Unify ONNX Runtime prepare model scripts ([5ecb13](https://github.com/intel/neural-compressor/commit/5ecb134988eef0d62bc43858ae7321b52ecc8590))


**Validated Configurations**
- Centos 8.4 & Ubuntu 22.04
- Python 3.7, 3.8, 3.9, 3.10, 3.11
- TensorFlow 2.11, 2.12, 2.13
- ITEX 1.1.0, 1.2.0, 2.13.0.0
- PyTorch/IPEX 1.12.1+cpu, 1.13.0+cpu, 2.0.1+cpu
- ONNX Runtime 1.13.1, 1.14.1, 1.15.1
- MXNet 1.9.1

2.2

Not secure
- Highlights
- Features
- Improvement
- Productivity
- Bug Fixes
- Examples
- External Contributes


**Highlights**
- Expanded SmoothQuant support on mainstream frameworks including PyTorch/IPEX, TensorFlow/ITEX, ONNX Runtime, and validated popular large language models (LLMs) such as GPT-J, LLaMA, OPT, BLOOM, Dolly, MPT, LaMini-LM and RedPajama-INCITE.
- Innovated two productivity components [Neural Solution](https://github.com/intel/neural-compressor/blob/master/neural_solution/README.md) for distributed quantization and [Neural Insights](https://github.com/intel/neural-compressor/blob/master/neural_insights/README.md) for quantization accuracy debugging.
- Successfully integrated Intel Neural Compressor into MSFT Olive ([157](https://github.com/microsoft/Olive/pull/157)) and DeepSpeed ([#3300](https://github.com/microsoft/DeepSpeed/pull/3300)).


**Features**
- [Quantization] Support TensorFlow SmoothQuant ([1f4127](https://github.com/intel/neural-compressor/commit/1f4127f173790f9ac5a13c05ec0260df15c00c67))
- [Quantization] Support ITEX SmoothQuant ([1f4127](https://github.com/intel/neural-compressor/commit/1f4127f173790f9ac5a13c05ec0260df15c00c67))
- [Quantization] Support PyTorch FX SmoothQuant ([6a39f6](https://github.com/intel/neural-compressor/commit/6a39f641b127584a7e11163723a8dfb2499dd7f8), [603811](https://github.com/intel/neural-compressor/commit/603811e0be240196dbdcdbf2cab9f215e74144a0))
- [Quantization] Support ONNX Runtime SmoothQuant ([3df647](https://github.com/intel/neural-compressor/commit/3df6478686d5405ba962d1e0f5c3c46ccec810c6), [1e1d70](https://github.com/intel/neural-compressor/commit/1e1d706a2ab3518a260d814d6bef11ca599d4f6a))
- [Quantization] Support dictionary inputs for IPEX quantization ([4ba233](https://github.com/intel/neural-compressor/commit/4ba2332f9f825f69f0119193f2d916cc04de3c44))
- [Quantization] Enable calibration algorithm Entropy/KL & Percentile for ONNX Runtime ([dae494](https://github.com/intel/neural-compressor/commit/dae49453022c93ef27337c2d2b8c32463ab173c5))
- [MixedPrecision] Support mixed precision op name/type dict option ([a9c2cb](https://github.com/intel/neural-compressor/commit/a9c2cb68a5e36a790c2db6eb8b91957099b888d7))
- [Strategy] Support block wise tuning ([9c26ed]( https://github.com/intel/neural-compressor/commit/9c26ed7fb438a8d0b3605599ecc0f4d2d886f555))
- [Strategy] Enable mse_v2 for ONNX Runtime ([62122d](https://github.com/intel/neural-compressor/commit/62122dd4037c15bec8ec1462c976baa5e63730f5))
- [Pruning] Support retrain free sparse ([d29aa0](https://github.com/intel/neural-compressor/commit/d29aa0ff3383e5c93740b82b8ddaab02d45d2e97))
- [Pruning] Support TensorFlow pruning with 2.x API ([072c13](https://github.com/intel/neural-compressor/commit/072c1391356ffefa56d729909919ff11a32005f6))


**Improvement**
- [Quantization] Enhance Keras functional model quantization with Keras model in, quantized Keras model out ([699751](https://github.com/intel/neural-compressor/commit/6997518f081dff83a4bc9c161a314ef4bc6d71c1))
- [Quantization] Enhance MatMul and Gather quantization for ONNX Runtime ([1f9c4f](https://github.com/intel/neural-compressor/commit/1f9c4fc4731adc11b75231662d091d1f3f2743f3))
- [Quantization] Add new recipe for ONNX Runtime NLP models ([10d82c](https://github.com/intel/neural-compressor/commit/10d82c56cba61d148fe63d5cced8b6868c705454))
- [MixedPrecision] Add more FP16 OPs support for ONNX Runtime ([15d551](https://github.com/intel/neural-compressor/commit/15d5518ef1682bfd815a663e30b2e99ddd511b45))
- [MixedPrecision] Add more BF16 OPs support for TensorFlow ([369b9d](https://github.com/intel/neural-compressor/commit/369b9d0ea4019736a802dc172b0a7d7b5c97cd9d))
- [Pruning] Enhance multihead-attention slim ([f3de50](https://github.com/intel/neural-compressor/commit/f3de501d7b266d64b26f3725f468da20363ec0d3))
- [Pruning] Enable progressive pruning in N:M pattern ([483e80](https://github.com/intel/neural-compressor/commit/483e80ad4ecd6f5c9504af4732e9c61e261db538))
- [Model Export] Refine PT2ONNX export ([877adb](https://github.com/intel/neural-compressor/commit/877adbd80d5b2e217c69ba4d5a791981aeabd8a0))
- Remove redundant classes for quantization, benchmark and mixed precision ([c51096](https://github.com/intel/neural-compressor/commit/c510969e9d9d76434b8a8938e19dddaab29495ec))


**Productivity**
- [Neural Solution] Support multi-node distribute tuning model-level parallelism ([ee049c](https://github.com/intel/neural-compressor/commit/ee049c749f0926c563683a74463354770a75afca))
- [Neural Insights] Support quantization and benchmark diagnosis with GUI ([5dc9ea](https://github.com/intel/neural-compressor/commit/5dc9eaf84198bf5f98bb8d16eb957be7218b23b5), [3bde2e](https://github.com/intel/neural-compressor/commit/3bde2ea9218d7b9e3bec5265d02bdad1f1f0cdc9), [898344](https://github.com/intel/neural-compressor/commit/898344622b588a6a5d99f92f20fcbb759b657736))
- [Neural Coder] Migrate Neural Coder support into 2.x API ([113ca1](https://github.com/intel/neural-compressor/commit/113ca1e15da5ab5e3ea584793efbcb006e4dfeee), [e74a8a](https://github.com/intel/neural-compressor/commit/e74a8a0a8ef2e4ea0f001c7c5f5004b6b41d8676))
- [Ecosystem] MSFT Olive integration ([157](https://github.com/microsoft/Olive/pull/157))
- [Ecosystem] MSFT DeepSpeed integration ([3300](https://github.com/microsoft/DeepSpeed/pull/3300))
- Support ITEX 1.2 ([5519e2](https://github.com/intel/neural-compressor/commit/5519e2c97b1889ed3f4c7259bc51016444a4131b))
- Support Python 3.11 ([6fa053](https://github.com/intel/neural-compressor/commit/6fa05322a0f065cd14c885f72cee640428b6aa53))
- Enhance documentations for mixed precision, diagnosis, dataloader, metric, etc.


**Bug Fixes**
- Fix ONNX Runtime SmoothQuant issues ([85c6a0](https://github.com/intel/neural-compressor/commit/85c6a0fbf841d9178663d18962f8975a18651bdd), [1b26c0](https://github.com/intel/neural-compressor/commit/1b26c0dc28cc0bc995c8532d82791a74a7aebe13))
- Fix bug in IPEX fallback ([b4f9c7](https://github.com/intel/neural-compressor/commit/b4f9c72cfb0940e0461d4c40b0ef4c21b4c9c791))
- Fix ITEX quantize/dequantize before BN u8 issue ([5519e2](https://github.com/intel/neural-compressor/commit/5519e2c97b1889ed3f4c7259bc51016444a4131b))
- Fix example inputs issue for IPEX smoothquant ([c8b753](https://github.com/intel/neural-compressor/commit/c8b753387d26fc2778c1ae9c10dfe5d49329217e))
- Fix IPEX mixed precision ([d1e734](https://github.com/intel/neural-compressor/commit/d1e7340295216c098e3178ab3c503556cc5e6bbe))
- Fix inspect tensor ([8f5f5d](https://github.com/intel/neural-compressor/commit/8f5f5deadf280fa7a1cbd76025eed9f726d65354))
- Fix PyTorch model [peleenet](https://github.com/intel/neural-compressor/commit/6290cffb2b28760e22b5d8d3905f18fe6abab16f), [3dunet](https://github.com/intel/neural-compressor/commit/af104953ff062421b9830e29a5c113b1d4b43eb6) accuracy issue after migrate into 2.x API
- Fix CVEs ([04c482](https://github.com/intel/neural-compressor/commit/04c482a24aaae847d976edb052e4dd7c7d739496), [efcd98](https://github.com/intel/neural-compressor/commit/efcd987c84ecd454e53e36d68876255df2308750), [6e9f7b](https://github.com/intel/neural-compressor/commit/6e9f7b7342181cbb1ae0cae9078e4426404dce23), [7abe32](https://github.com/intel/neural-compressor/commit/7abe32e194a274a4c5e13be52dc6846860b54d03))


**Examples**
- Enable 4 ONNX Runtime examples, [layoutlmv3, layoutlmft](https://github.com/intel/neural-compressor/commit/3c1e894fc28a7c599adb5a1c1dcfa6830609f1ff), [deberta-v3](https://github.com/intel/neural-compressor/commit/abac54e1320b2dba1c5d9e9d7a76c042a3c08c47), [GPTJ-6B](https://github.com/intel/neural-compressor/commit/ac5a67149948d25b41174fe0ae49833612a4191d).
- Enable 2 TensorFlow LLMs with SmoothQuant, [facebook-opt-125m, gpt2-medium](https://github.com/intel/neural-compressor/commit/1f4127f173790f9ac5a13c05ec0260df15c00c67).


**External Contributes**
- Add a mathematical check for SmoothQuant transform ([5c04ac](https://github.com/intel/neural-compressor/commit/5c04acddc4df21f8a8384b51ed1e25e3fc8493e3))
- Fix mismatch absorb layers due to tracing and named modules for SmoothQuant ([bccc89](https://github.com/intel/neural-compressor/commit/bccc89f4d2b56a87248161b0069ed03c311fe73f))
- Fix trace issue when input is dictionary for SmoothQuant ([6a3c64](https://github.com/intel/neural-compressor/commit/6a3c64318ecf0edd8d7e9fa25d1c61791362aefa))
- Allow dictionary model inputs for ONNX export ([17b642](https://github.com/intel/neural-compressor/commit/17b6425a4c243b2ee34b81b684c212dcd0de4a53))


**Validated Configurations**
- Centos 8.4 & Ubuntu 22.04
- Python 3.7, 3.8, 3.9, 3.10, 3.11
- TensorFlow 2.10.0, 2.11.0, 2.12.0
- ITEX 1.1.0, 1.2.0
- PyTorch/IPEX 1.12.1+cpu, 1.13.0+cpu, 2.0.1+cpu
- ONNX Runtime 1.13.1, 1.14.1, 1.15.0
- MXNet 1.9.1

Page 2 of 7

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.