OpenVINO
Export CLI
* Add OpenVINO export CLI by echarlaix in https://github.com/huggingface/optimum-intel/pull/437
bash
optimum-cli export openvino --model gpt2 ov_model
New architectures
LCMs
* Enable Latent Consistency models OpenVINO export and inference by echarlaix in https://github.com/huggingface/optimum-intel/pull/463
python
from optimum.intel import OVLatentConsistencyModelPipeline
pipe = OVLatentConsistencyModelPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7", export=True)
prompt = "sailing ship in storm by Leonardo da Vinci"
images = pipe(prompt=prompt, num_inference_steps=4, guidance_scale=8.0).images
Pix2Struct
* Add support for export and inference for pix2struct models by eaidova in https://github.com/huggingface/optimum-intel/pull/450
GPTBigCode
* Add support for export and inference for GPTBigCode models by echarlaix in https://github.com/huggingface/optimum-intel/pull/459
Changes and bugfixes
* Move VAE execution to fp32 precision on GPU by eaidova in https://github.com/huggingface/optimum-intel/pull/432
* Enable OpenVINO export without ONNX export step by eaidova in https://github.com/huggingface/optimum-intel/pull/397
* Enable 8-bit weight compression for OpenVINO model by l-bat in https://github.com/huggingface/optimum-intel/pull/415
* Add image reshaping for statically reshaped OpenVINO SD models by echarlaix in https://github.com/huggingface/optimum-intel/pull/428
* OpenVINO device updates by helena-intel in https://github.com/huggingface/optimum-intel/pull/434
* Fix decoder model without cache by echarlaix in https://github.com/huggingface/optimum-intel/pull/438
* Fix export by echarlaix in https://github.com/huggingface/optimum-intel/pull/439
* Added 8 bit weights compression by default for decoders larger than 1B by AlexKoff88 in https://github.com/huggingface/optimum-intel/pull/444
* Add fp16 and int8 conversion to OVModels and export CLI by echarlaix in https://github.com/huggingface/optimum-intel/pull/443
python
model = OVModelForCausalLM.from_pretrained(model_id, load_in_8bit=True)
* Create default attention mask when needed but not provided by eaidova in https://github.com/huggingface/optimum-intel/pull/457
* Do not automatically cache models when exporting a model in a temporary directory by helena-intel in https://github.com/huggingface/optimum-intel/pull/462
Neural Compressor
* Integrate INC weight-only quantization by mengniwang95 in https://github.com/huggingface/optimum-intel/pull/417
* Support num_key_value_heads by jiqing-feng in https://github.com/huggingface/optimum-intel/pull/447
* Enable ORT model support to INC quantizer by echarlaix in https://github.com/huggingface/optimum-intel/pull/436
* fix INC model loading by echarlaix in https://github.com/huggingface/optimum-intel/pull/452
* Fix INC modeling by echarlaix in https://github.com/huggingface/optimum-intel/pull/453
* Add starcode past-kv shape for TSModelForCausal class by changwangss in https://github.com/huggingface/optimum-intel/pull/371
* Fix transformers v4.35.0 compatibility by echarlaix in https://github.com/huggingface/optimum-intel/pull/471
* Fix compatibility for optimum next release by echarlaix in https://github.com/huggingface/optimum-intel/pull/460
**Full Changelog**: https://github.com/huggingface/optimum-intel/commits/v1.12.0