Neural-compressor

Latest version: v3.3.1

Safety actively analyzes 723976 Python packages for vulnerabilities to keep your Python projects secure.

Page 2 of 8

2.5

- Highlights
- Features
- Improvement
- Productivity
- Bug Fixes
- External Contributes
- Validated Configurations

**Highlights**
- Integrated Weight-Only Quantization algorithm [AutoRound](https://github.com/intel/auto-round) and verified on Gaudi2, Intel CPU, NV GPU
- Applied SmoothQuant & Weight-Only Quantization algorithms with 15+ popular LLMs for INT8 & INT4 quantization and published the [recipes](https://github.com/intel/neural-compressor/blob/master/docs/source/llm_recipes.md)

**Features**
- [Quantization] Integrate Weight-Only Quantization algorithm [AutoRound](https://github.com/intel/auto-round) ([5c7f33](https://github.com/intel/neural-compressor/commit/5c7f336037e602321efec71aa516bc4fade082c8), [dfd083](https://github.com/intel/neural-compressor/commit/dfd083df3234aa27548391c381d3ac9dc8139676), [9a7ddd](https://github.com/intel/neural-compressor/commit/9a7ddda6f852792663e9cf3f076b88caa32e83a8), [cf1de7](https://github.com/intel/neural-compressor/commit/cf1de74608c4dc16bb18abee8f9680985af5ad6e))
- [Quantization] Quantize weight with in-place mode in Weight-Only Quantization ([deb1ed](https://github.com/intel/neural-compressor/commit/deb1ed51cd2ad9768318ae49763898d3bb7af663))
- [Pruning] Enable SNIP on multiple cards using DeepSpeed ZeRO-3 ([49ab28](https://github.com/intel/neural-compressor/commit/49ab28d362b03338194160e5ef67a3b8c7967a86))
- [Pruning] Support new pruning approach Wanda and DSNOT for PyTorch LLM ([7a3671](https://github.com/intel/neural-compressor/commit/7a367179804565777241f73a87f903c82b6723e0))

**Improvement**
- [Quantization] SmoothQuant code structure refactor ([a8d81c](https://github.com/intel/neural-compressor/commit/a8d81caacd6aeac640d8a1b38af2e10225ee97c0))
- [Quantization] Optimize the workflow of parsing Keras model ([b816d7](https://github.com/intel/neural-compressor/commit/b816d7769992fefc711fcbff21d0dbea8a36230e))
- [Quantization] Support static_groups options in GPTQ API ([1c426a](https://github.com/intel/neural-compressor/commit/1c426a0738c68403a2ef3aaef9b19ecff8f2721a))
- [Quantization] Update TEQ train dataloader ([d1e994](https://github.com/intel/neural-compressor/commit/d1e994becee163da816d027bf39389581751920b))
- [Quantization] WeightOnlyLinear keeps self.weight after recover ([2835bd](https://github.com/intel/neural-compressor/commit/2835bdbd3b5a2797ea1323818b2d154154f93331))
- [Quantization] Add version condition for IPEX prepare init ([d96e14](https://github.com/intel/neural-compressor/commit/d96e14aff080813ee55badc84e4efb59dddc73d7))
- [Quantization] Enhance the ORT node name checking ([f1597a](https://github.com/intel/neural-compressor/commit/f1597aae743f19745822f9e57e2634cbb1a08098))
- [Pruning] Stop the tuning process early when enabling smooth quant ([844a03](https://github.com/intel/neural-compressor/commit/844a032766ff823280d4bbfd350f65d3f6c284db))

**Productivity**
- ORT LLM examples support latest optimum version ([26b260](https://github.com/intel/neural-compressor/commit/26b260e174cac13b023a11caab372b2dcdc593e0))
- Add coding style docs and recommended VS Code setting ([c1f23c](https://github.com/intel/neural-compressor/commit/c1f23ce5a54caf907951d5daf7c14e80259ad25f))
- Adapt transformers 4.37 loading ([6133f4](https://github.com/intel/neural-compressor/commit/6133f4e158648d242f3f78f125b0c59c8a214cb7))
- Upgrade pre-commit checker for black/blacken-docs/ruff ([7763ed](https://github.com/intel/neural-compressor/commit/7763ed08c07d9e048467429b1b727e254db26665))
- Support CI summary in PR comments ([d4bcdd](https://github.com/intel/neural-compressor/commit/d4bcdd459cb8aab7268dc96528720c36d81f1ee3)))
- Notebook example update to install latest INC & TF, add metric in fit ([4239d3](https://github.com/intel/neural-compressor/commit/4239d3675fe833aa4d145ffb49dd938bdc4c8726))

**Bug Fixes**
- Fix QA IPEX example fp32 input issue ([c4de19](https://github.com/intel/neural-compressor/commit/c4de1982961e604e698729fb153cd330d4139777))
- Update Conditions of Getting min-max during TF MatMul Requantize ([d07175](https://github.com/intel/neural-compressor/commit/d07175c39cd796c17582e986268a3a7179683763))
- Fix TF saved_model issues ([d8e60b](https://github.com/intel/neural-compressor/commit/d8e60b8eda59098bd29d6e314ed3383300c0f642))
- Fix comparison of module_type and MulLinear ([ba3aba](https://github.com/intel/neural-compressor/commit/ba3abac86b92ddac192d8920df28979e5cdeb5c5))
- Fix ORT calibration issue ([cd6d24](https://github.com/intel/neural-compressor/commit/cd6d244de2687de9e43a12b1f4032d9ff2281874))
- Fix ORT example bart export failure ([b0dc0d](https://github.com/intel/neural-compressor/commit/b0dc0de8325ee9686900d904c62bb5f36626f3ed))
- Fix TF example accuracy diff during benchmark and quantization ([5943ea](https://github.com/intel/neural-compressor/commit/5943eaefd11a19099c78090e590d4fb783c5e1ad))
- Fix bugs for GPTQ exporting with static_groups ([b4e37b](https://github.com/intel/neural-compressor/commit/b4e37b74ff077acf85246527a7ddae7a2e3f08d1))
- Fix ORT quant issue caused by tensors having same name ([0a20f3](https://github.com/intel/neural-compressor/commit/0a20f30a6e4734b3fb84a27266c386c32bd1d4a8))
- Fix Neural Solution SQL/CMD injection ([14b7b0](https://github.com/intel/neural-compressor/commit/14b7b0ab8ccb2c04f6595aeccb37f689e090c806))
- Fix the best qmodel recovery issue ([f2d9b7](https://github.com/intel/neural-compressor/commit/f2d9b78b1cd5908a9613deb1d9e12f7f31a0e308))
- Fix logger issue ([83bc77](https://github.com/intel/neural-compressor/commit/83bc779a4e97d8886383025d324d8379f70cc8b7))
- Store token in protected file ([c6f9cc](https://github.com/intel/neural-compressor/commit/c6f9ccaa25d73461cb502e43b1b0dd2b9e98909f))
- Define the default SSL context ([b08725](https://github.com/intel/neural-compressor/commit/b08725a5afe5ffc97807d1d743da74b4f0432fee))
- Fix IPEX stats bug ([5af383](https://github.com/intel/neural-compressor/commit/5af3834651900a8a3bc80a2b43d40fc153fbcf85))
- Fix ORT calibration for Dml EP ([c58aea](https://github.com/intel/neural-compressor/commit/c58aeaa7b88a14d05e94500f2b644caed189e80c))
- Fix wrong socket number retrieval for non-english system ([5b2a88](https://github.com/intel/neural-compressor/commit/5b2a88708eb5a63b760a5a109064e740b3d5297a))
- Fix trust remote for llm examples ([2f2c9a](https://github.com/intel/neural-compressor/commit/2f2c9a246176a641d4695e92db3b2d0514f21673))

**External Contributes**
- Intel Mac support ([21cfeb](https://github.com/intel/neural-compressor/commit/21cfeb83cbcd065a5d096c3a84da3deab256ea07))
- Add PTQ example for PyTorch CV Segment Anything Model ([bd5e69](https://github.com/intel/neural-compressor/commit/bd5e698a11c3b144788f9b431f65a8ab06374c0b))

**Validated Configurations**
- Centos 8.4 & Ubuntu 22.04 & Win 11 & MacOS Ventura 13.5
- Python 3.8, 3.9, 3.10, 3.11
- TensorFlow 2.13, 2.14, 2.15
- ITEX 2.13.0, 2.14.0
- PyTorch/IPEX 2.0, 2.1, 2.2
- ONNX Runtime 1.15, 1.16, 1.17

2.5.0

2.4.1

Not secure

- Improvement
- Bug Fixes
- Examples
- Validated Configurations

**Improvement**
- Narrow down the tuning space of SmoothQuant auto-tune ([9600e1](https://github.com/intel/neural-compressor/commit/9600e1d973bf17e962e55fb0f3817db76ce8fea9))
- Support ONNXRT Weight-Only Quantization with different dtypes ([5119fc](https://github.com/intel/neural-compressor/commit/5119fcb091c3c26ffc3795781986f250d072408d))
- Add progress bar for ONNXRT Weight-Only Quantization and SmoothQuant ([4d26e3](https://github.com/intel/neural-compressor/commit/4d26e3aa94a2e1bd7481abdfe9226d31045b33db))

**Bug Fixes**
- Fix SmoothQuant alpha-space generation ([33ece9](https://github.com/intel/neural-compressor/commit/33ece90016bce790ad00729b885b15b1b496f4d7))
- Fix inputs error for SmoothQuant example_inputs ([39f63a](https://github.com/intel/neural-compressor/commit/39f63a8d4b92b434bc9d964a17c112bd1ffc7115))
- Fix LLMs accuracy regression with IPEX 2.1.100 ([3cb6d3](https://github.com/intel/neural-compressor/commit/3cb6d38f3e5a74c5657b0614c012c207dae4d5b1))
- Fix quantizable add ops detection on IPEX backend ([4c004d](https://github.com/intel/neural-compressor/commit/4c004d777d28a96fcea7d80d3e26c06121acc476))
- Fix range step bug in ORTSmoothQuant ([40275c](https://github.com/intel/neural-compressor/commit/40275cbf95d2609d582d9c6bee7929a41dc7cb59))
- Fix unit test bugs and update CI versions ([6c78df](https://github.com/intel/neural-compressor/commit/6c78dfefccf03c246f8f2ac47777c90c7aa58cbd), [835805](https://github.com/intel/neural-compressor/commit/835805cfb89022822b9c3753f7b2f160da7ee4f0))
- Fix notebook issues ([08221e](https://github.com/intel/neural-compressor/commit/08221e1cc12f22da1e9749e33c123a72dc76a44d))

**Examples**
- Add verified LLMs list and recipes for SmoothQuant and Weight-Only Quantization ([f19cc9](https://github.com/intel/neural-compressor/commit/f19cc9db4f7727ff68cd250841d5442ec1cfcd7d))
- Add code-generaion evaluation for Weight-Only Quantization GPTQ ([763440](https://github.com/intel/neural-compressor/commit/763440912bf872195993b92588e5134c0d367dce))

**Validated Configurations**
- Centos 8.4 & Ubuntu 22.04
- Python 3.10
- TensorFlow 2.14
- ITEX 2.14.0.1
- PyTorch/IPEX 2.1.0
- ONNX Runtime 1.16.3

2.4

Not secure

- Highlights
- Features
- Improvement
- Productivity
- Bug Fixes
- Examples
- Validated Configurations

**Highlights**
- Supported [layer-wise](https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_layer_wise.md) quantization for PyTorch RTN/GPTQ Weight-Only Quantization and ONNX Runtime W8A8 quantization.
- Supported Weight-Only Quantization tuning for ONNX Runtime backend.
- Supported GGML double quant on RTN/GPTQ Weight-Only Quantization with FW extension API
- Supported SmoothQuant of Big Saved Model for TensorFlow Backend.

**Features**
- [Quantization] Support GGML double quant in Weight-Only Quantization for RTN and GPTQ ([05c15a](https://github.com/intel/neural-compressor/commit/05c15a4a66699a32ba30a47efbc457b5abb40983))
- [Quantization] Support Weight-Only Quantization tuning for ONNX Runtime backend ([6d4ea5](https://github.com/intel/neural-compressor/commit/6d4ea5b114d7af4030626702da7b515a3f7771a9), [934ba0](https://github.com/intel/neural-compressor/commit/934ba032386d73ca85e03b891ae05757fe4faac0), [4fcfdf](https://github.com/intel/neural-compressor/commit/4fcfdf8f2960b1e5409d3e5c835d166ec127089b))
- [Quantization] Support SmoothQuant block-wise alpha-tuning ([ee6bc2](https://github.com/intel/neural-compressor/commit/ee6bc284b1e65c37b2f46857da9cd315019b208a))
- [Quantization] Support SmoothQuant of Big Saved Model for TensorFlow Backend ([3b2925](https://github.com/intel/neural-compressor/commit/3b2925263845ea6e38df5b50912fbfd4bd6b85ee), [4f2c35](https://github.com/intel/neural-compressor/commit/4f2c35d1ffda4384245339ddf30f3a3dc0ed1772))
- [Quantization] Support PyTorch layer-wise quantization for GPTQ ([ee5450](https://github.com/intel/neural-compressor/commit/ee5450707bf16cceff9de8048a8fdddf968ca936))
- [Quantization] support PyTorch layer-wise quantization for RTN ([ebd1e2](https://github.com/intel/neural-compressor/commit/ebd1e240d006566e373f2b4854d6f3157ae33c36))
- [Quantization] Support ONNX Runtime layer-wise W8A8 quantization ([6142e4](https://github.com/intel/neural-compressor/commit/6142e48dda5c5023a69df56008a73aad54cfea58), [5d33a5](https://github.com/intel/neural-compressor/commit/5d33a536440ec69e56da2ee3834ed21d6c077862))
- [Common] [Experimental] FW extension API implement ([76b8b3](https://github.com/intel/neural-compressor/commit/76b8b3f718e0533b2be01c87987ba807da2a108a), [8447d7](https://github.com/intel/neural-compressor/commit/8447d7097fa33231b8a6e4a9e26e526d191787de), [258236](https://github.com/intel/neural-compressor/commit/2582361aee60be11517e7b2985293e2c8e75d004))
- [Quantization] [Experimental] FW extension API for PT backend support Weight-Only Quantization ([915018](https://github.com/intel/neural-compressor/commit/9150181bb2ab71201fbdb052fbcaa2aba18a090a), [dc9328](https://github.com/intel/neural-compressor/commit/dc9328c09b243d7df3bccc0a35a8a12feaabb40a))
- [Quantization] [Experimental] FW extension API for TF backend support Keras Quantization ([2627d3](https://github.com/intel/neural-compressor/commit/2627d33b9ff900697184972575969ecc55da8923))
- [Quantization] IPEX 2.1 XPU (CPU+GPU) support ([af0b50](https://github.com/intel/neural-compressor/commit/af0b50fd201c672c141c2c0508cdd4e4ae62684f), [cf847c](https://github.com/intel/neural-compressor/commit/cf847c1abf8f09ce4ae6e21f62aeb5ec7fa2f358))

**Improvement**
- [Quantization] Add use_optimum_format for export_compressed_model in Weight-Only Quantization ([5179da](https://github.com/intel/neural-compressor/commit/5179da10dd6540660f320206723e2651b4d6a292), [0a0644](https://github.com/intel/neural-compressor/commit/0a06448384699aa1e51bd2cd986c226382b9cd94))
- [Quantization] Enhance ONNX Runtime quantization with DirectML EP ([db0fef](https://github.com/intel/neural-compressor/commit/db0fef72137e0ec6a5ffbdc0e470e84ea4cefeb1), [d13183](https://github.com/intel/neural-compressor/commit/d1318320632dea889a3eb5287dea07ca320673aa), [098401](https://github.com/intel/neural-compressor/commit/098401d029e33741cfe42011544bf10e64323e46), [6cad50](https://github.com/intel/neural-compressor/commit/6cad5008cc4e1880ba5d9e01538e4a9dd3be962e))
- [Quantization] Support restore ipex model from json ([c3214c](https://github.com/intel/neural-compressor/commit/c3214c9ec007301896c8a5df587be81fb047b541))
- [Quantization] ONNX Runtime add attr to MatMulNBits ([7057e3](https://github.com/intel/neural-compressor/commit/7057e3b2a3ae1f9911516fb173287e1832f77afd))
- [Quantization] Increase SmoothQuant auto alpha running speed ([173c18](https://github.com/intel/neural-compressor/commit/173c1881ae4b4a04868b66a2253fc44807d63349))
- [Quantization] Add SmoothQuant alpha search space as a config argument ([f9663d](https://github.com/intel/neural-compressor/commit/f9663d0e03867ba28af31b332bc42f614c994fd9))
- [Quantization] Add SmoothQuant weight_clipping as a default_on option ([1f4aec](https://github.com/intel/neural-compressor/commit/1f4aec3c7149d93568106326e98a91933826475e))
- [Quantization] Support SmoothQuant with MinMaxObserver ([45b496](https://github.com/intel/neural-compressor/commit/45b4966d3f4198d0fe799488a4ec4c9b844bc07d))
- [Quantization] Support Weight-Only Quantization with fp16 for PyTorch backend ([d5cb56](https://github.com/intel/neural-compressor/commit/d5cb567f0a86ecec3ef561ee69fc44c0cb6cb248))
- [Quantization] Support trace with dictionary type example_inputs ([afe315](https://github.com/intel/neural-compressor/commit/afe3159c4872db654465602c4b237d28cbbf7610))
- [Quantization] Support falcon Weight-Only Quantization ([595d3a](https://github.com/intel/neural-compressor/commit/595d3a1987c77542cd8b904dd16dc1c95ba9bdc7))
- [Common] Add deprecation decorator in experimental fold ([aeb3ed](https://github.com/intel/neural-compressor/commit/aeb3ed292b30b941bd69ef40dd99b0031b2171e0))
- [Common] Remove 1.x API dependency ([ee617a](https://github.com/intel/neural-compressor/commit/ee617a44aaef3d2c6860d850a384db3c4abe80fa))
- [Mixed Precision] Support PyTorch eager mode BF16 MixedPrecision ([3bfb76](https://github.com/intel/neural-compressor/commit/3bfb76dca9271c48940ca4af26cabfa48f50ca56))

**Productivity**
- Support quantization and benchmark on macOS ([16d6a0](https://github.com/intel/neural-compressor/commit/16d6a06c212fbc8908d443b16a354fdfd6337629))
- Support ONNX Runtime 1.16.0 ([d81732](https://github.com/intel/neural-compressor/commit/d817328dd9040f3896bed71c1bb1c181a3cc805c), [299af9](https://github.com/intel/neural-compressor/commit/299af9a162d1f7c4f2e6d4eaceab69163121bee8), [753783](https://github.com/intel/neural-compressor/commit/753783011d2f0fc71abe331e589aff94f63aeac3))
- Support TensorFlow new API for gnr-base ([8160c7](https://github.com/intel/neural-compressor/commit/8160c71b845d7c76f6ff2d66883c871450f57687))

**Bug Fixes**
- Fix GraphModule object has no attribute bias ([7f53d1](https://github.com/intel/neural-compressor/commit/7f53d16914926cec9cf8e02be1ed3dcffd2ca730))
- Fix ONNX model export issue ([af0aea](https://github.com/intel/neural-compressor/commit/af0aea4b221c188e739fde205e22769f7532ffc1), [eaa57f](https://github.com/intel/neural-compressor/commit/eaa57f69ec0de8e821fe6e3297836b08a3aee3f1))
- Add clip for ONNX Runtime SmoothQuant ([cbb69b](https://github.com/intel/neural-compressor/commit/cbb69bf9c244d371833db1092aec9a66b304b7a4))
- Fix SmoothQuant minmax observer init ([b1db1c](https://github.com/intel/neural-compressor/commit/b1db1cf3d102a1a6234f82493f27ac3362c0e150))
- Fix SmoothQuant issue in get/set_module ([dffcfe](https://github.com/intel/neural-compressor/commit/dffcfe1e581a5bcfdd45c24147e934c180131e9d))
- Align sparsity with block-wise masks in progressive pruning ([fcdc29](https://github.com/intel/neural-compressor/commit/fcdc29ad8105ec3b6fc305dedc2edd6fc9c6cd87))

**Examples**
- Support peft model with SmoothQuant ([5e21b7](https://github.com/intel/neural-compressor/commit/5e21b7057b56a46f9299ee35c18ebf47082d1175))
- Enable two ONNX Runtime examples table-transformer-detection ([550cee](https://github.com/intel/neural-compressor/commit/550cee2f1538db7bb542e6c24c3d599fc1d8ef4d)), BEiT ([7265df](https://github.com/intel/neural-compressor/commit/7265dfee1050979ed6672b7efb9cfd4f9bd86acf))

**Validated Configurations**
- Centos 8.4 & Ubuntu 22.04 & Win10 & MacOS Ventura 13.5
- Python 3.8, 3.9, 3.10, 3.11
- TensorFlow 2.13, 2.14, 2.15
- ITEX 1.2.0, 2.13.0.0, 2.14.0.1
- PyTorch/IPEX 1.13.0+cpu, 2.0.1+cpu, 2.1.0
- ONNX Runtime 1.14.1, 1.15.1, 1.16.3
- MXNet 1.9.1

2.3.2

Not secure

- Features
- Bug Fixes

**Features**
- Reduce memory consumption in ONNXRT adaptor ([f64833](https://github.com/intel/neural-compressor/commit/f64833d9912e74e734e17dd894baca5cde7a8481))
- Support MatMulFpQ4 for onnxruntime 1.16 ([1beb43](https://github.com/intel/neural-compressor/commit/1beb435c7f46a0292dbf0d49ad3882c4770c6200))
- Support MatMulNBits for onnxruntime 1.17 ([67a31b](https://github.com/intel/neural-compressor/commit/67a31ba897cca3d1769b074ecbdebc2fc9389428))

**Bug Fixes**
- Update ITREX version in ONNXRT WOQ example and fix bugs in hf models ([0ca51a](https://github.com/intel/neural-compressor/commit/0ca51a153dac26a794ebe3f03eb7fb7cd49c251e))
- Update ONNXRT WOQ example into llama-2-7b ([7f2063](https://github.com/intel/neural-compressor/commit/7f2063f79f190e1fdfe3edcae0504ef9025f2c7c))
- Fix ONNXRT WOQ failed with None model_path ([cbd0a4](https://github.com/intel/neural-compressor/commit/cbd0a416f46229b539005a359eabbbd3230518f6))

**Validated Configurations**
- Centos 8.4 & Ubuntu 22.04
- Python 3.10
- TensorFlow 2.13
- ITEX 2.13
- PyTorch/IPEX 2.0.1+cpu
- ONNX Runtime 1.15.1
- MXNet 1.9.1

2.3.1

Not secure

- Bug Fixes
- Productivity

**Bug Fixes**
- Fix PyTorch SmoothQuant for auto alpha ([e9c14a](https://github.com/intel/neural-compressor/commit/e9c14a51b88b3cf73fcb0625cab708173615734f), [35def7](https://github.com/intel/neural-compressor/commit/35def7b49ae67a00218ebd89bf629456626cc590))
- Fix PyTorch SmoothQuant calibration memory overhead ([49e950](https://github.com/intel/neural-compressor/commit/49e950e889ec577b9191091bb575d804e2c022bb))
- Fix PyTorch SmoothQuant issue in get/set_module (Issue 1265)([6de9ce](https://github.com/intel/neural-compressor/commit/6de9ce3d30fd7477367dcfd6bdfb583a16779b74))
- Support falcon Weight-Only Quantization ([bf7b5c](https://github.com/intel/neural-compressor/commit/bf7b5cfb9ef0b5943ea38c41fd0c35f19c7a5fdb))
- Remove Conv2d in Weight-Only Quantization adaptor white list ([1a6526](https://github.com/intel/neural-compressor/commit/1a6526f24d3e00f932578a74fa4019a6ba90fb8b))
- Fix TensorFlow ssd_resnet50_v1 Example for TF New API ([c63fc5](https://github.com/intel/neural-compressor/commit/c63fc5f62b307548b82c4f5683aec415ba54ab12))

**Productivity**
- Adapt Example for TensorFlow 2.14 AutoTrackable API Change ([424cf3](https://github.com/intel/neural-compressor/commit/424cf3ab0418794aa8ade77d143a6951deef9055))

**Validated Configurations**
- Centos 8.4 & Ubuntu 22.04
- Python 3.10
- TensorFlow 2.13, 2.14
- ITEX 2.13
- PyTorch/IPEX 2.0.1+cpu
- ONNX Runtime 1.15.1
- MXNet 1.9.1

Page 2 of 8

Releases

Has known vulnerabilities

Previous Next

Neural-compressor

Page 2 of 8

2.5

2.5.0

2.4.1

2.4

2.3.2

2.3.1

Page 2 of 8

Links

Releases