Optimum

Latest version: v1.23.3

Safety actively analyzes 679296 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 9 of 22

1.12.1

* Fix causal language models export by eaidova in https://github.com/huggingface/optimum-intel/pull/477

1.12

- Switch to SynapseAI v1.12.0 453 regisss


Various model optimizations

- Fix graph compilation error from Falcon when batch size>1 356 schoi-habana
- Add mpt optimization for gaudi 363 sywangyi
- Improve MPT inference performance 377 schoi-habana
- Allocate KV cache in contiguous memory for HPU performance 394 puneeshkhanna
- Add support for attention softmax in BF16 such as for llama 396 puneeshkhanna
- Add trim logit logic to reduce maximum memory usage for Llama inference 395 BaihuiJin
- Skip hpugraph usage for first token to save memory 397 polisettyvarma
- Llama inference:add reuse_cache to save memory 409 regisss
- GPT2 contiguous fix 421 ZhaiFeiyue
- Improve perf and memory usage with reuse cache by slicing inputs till token idx for 1st token generation 422 puneeshkhanna
- GPT-J/NeoX contiguous 454 BaihuiJin


TGI

- Fix gptj incorrect output issue in TGI 340 sywangyi
- Enable hpu graph 330 htang2012
- Upgrade to TGI v1.0.3 373 regisss
- Accelerate the inference when input prompt length changes in TGI 386 sywangyi
- Support static shape in concatenate and filter in TGI 389 sywangyi
- Fix bloom concatenate and filter issue 401 sywangyi
- Fix error in logits process in hpu graph 404 sywangyi
- Fix first token 408 regisss
- Temporary fix in TGI for max total tokens 443 hsubramony


Check min version in examples

A utility method was added to ensure that the latest version of Optimum Habana is installed to run the examples.

- Add check_optimum_habana_min_version 335 regisss


Others

- Add support for autocast custom ops in GaudiTrainer 308 regisss
- Add warmup arg and move stats printing to the end 390 polisettyvarma
- Add a configurable max input tokens parameter 426 puneeshkhanna
- Add transformers model tests for gaudi 427 ankurneog
- Modify loraft llama falcon 415 libinta
- Option to not crop in dataset run 444 ssarkar2
- Enable auto tensor parallelism for Falcon 451 mandy-li


Various fixes

- Fixes for streaming dataset mode 324 MohitIntel
- Fix beam search output 360 puneeshkhanna
- Fix DDP for LoRA 368 sywangyi
- Load llama ckpt to meta to work around OOM issue on CPU 359 mandy-li
- Fix gradient checkpointing in LoRA example 398 regisss
- No need to wrap DDP when using Fast DDP 430 ikurtchen
- Fix falcon-40b error when DeepSpeed enabled 434 schoi-habana
- Revert "Fix T5 DeepSpeed ZeRO-3 (393)" 466 sywangyi


Regression tests for this release are available here: https://github.com/huggingface/optimum-habana/actions/runs/6580186897

1.12.0

OpenVINO


Export CLI

* Add OpenVINO export CLI by echarlaix in https://github.com/huggingface/optimum-intel/pull/437

bash
optimum-cli export openvino --model gpt2 ov_model



New architectures

LCMs

* Enable Latent Consistency models OpenVINO export and inference by echarlaix in https://github.com/huggingface/optimum-intel/pull/463

python
from optimum.intel import OVLatentConsistencyModelPipeline

pipe = OVLatentConsistencyModelPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7", export=True)
prompt = "sailing ship in storm by Leonardo da Vinci"
images = pipe(prompt=prompt, num_inference_steps=4, guidance_scale=8.0).images


Pix2Struct

* Add support for export and inference for pix2struct models by eaidova in https://github.com/huggingface/optimum-intel/pull/450

GPTBigCode

* Add support for export and inference for GPTBigCode models by echarlaix in https://github.com/huggingface/optimum-intel/pull/459

Changes and bugfixes

* Move VAE execution to fp32 precision on GPU by eaidova in https://github.com/huggingface/optimum-intel/pull/432
* Enable OpenVINO export without ONNX export step by eaidova in https://github.com/huggingface/optimum-intel/pull/397
* Enable 8-bit weight compression for OpenVINO model by l-bat in https://github.com/huggingface/optimum-intel/pull/415
* Add image reshaping for statically reshaped OpenVINO SD models by echarlaix in https://github.com/huggingface/optimum-intel/pull/428
* OpenVINO device updates by helena-intel in https://github.com/huggingface/optimum-intel/pull/434
* Fix decoder model without cache by echarlaix in https://github.com/huggingface/optimum-intel/pull/438
* Fix export by echarlaix in https://github.com/huggingface/optimum-intel/pull/439
* Added 8 bit weights compression by default for decoders larger than 1B by AlexKoff88 in https://github.com/huggingface/optimum-intel/pull/444
* Add fp16 and int8 conversion to OVModels and export CLI by echarlaix in https://github.com/huggingface/optimum-intel/pull/443
python
model = OVModelForCausalLM.from_pretrained(model_id, load_in_8bit=True)

* Create default attention mask when needed but not provided by eaidova in https://github.com/huggingface/optimum-intel/pull/457
* Do not automatically cache models when exporting a model in a temporary directory by helena-intel in https://github.com/huggingface/optimum-intel/pull/462


Neural Compressor
* Integrate INC weight-only quantization by mengniwang95 in https://github.com/huggingface/optimum-intel/pull/417
* Support num_key_value_heads by jiqing-feng in https://github.com/huggingface/optimum-intel/pull/447
* Enable ORT model support to INC quantizer by echarlaix in https://github.com/huggingface/optimum-intel/pull/436
* fix INC model loading by echarlaix in https://github.com/huggingface/optimum-intel/pull/452
* Fix INC modeling by echarlaix in https://github.com/huggingface/optimum-intel/pull/453
* Add starcode past-kv shape for TSModelForCausal class by changwangss in https://github.com/huggingface/optimum-intel/pull/371
* Fix transformers v4.35.0 compatibility by echarlaix in https://github.com/huggingface/optimum-intel/pull/471
* Fix compatibility for optimum next release by echarlaix in https://github.com/huggingface/optimum-intel/pull/460




**Full Changelog**: https://github.com/huggingface/optimum-intel/commits/v1.12.0

1.11.2

Remove the Transformers version constraint on `optimum[habana]`.

- Remove Transformers version constraint on Optimum Habana 1290 by regisss

**Full Changelog**: https://github.com/huggingface/optimum/compare/v1.11.1...v1.11.2

1.11.1

* Fix compatibility with `optimum` by echarlaix in https://github.com/huggingface/optimum-intel/commit/b4663b4d7e7139643623cc2d335d39b3c46a5a2c

**Full Changelog**: https://github.com/huggingface/optimum-intel/compare/v1.11.0...v1.11.1

1.11

SynapseAI v1.11 (latest stable release) is fully supported.

- Upgrade to Synapse 1.11 333 regisss


Optimizations for Llama 2, Falcon, StarCoder, OPT, GPT-NeoX, CodeGen

- Added support for OPT-66B 285 ZhaiFeiyue
- Llama 296 yeonsily
- Improve Llama2 and gpt_neox performance with Habana fused RoPE and RMSNorm 321 mandy-li
- Enable Falcon-7b 326 schoi-habana
- Fix inference with Llama-2-70B 342 regisss
- Add model optimizations for codegen and gpt_bigcode 322 PhillipHoward


Torch Autocast

:warning: **Habana Mixed Precision is deprecated and will be removed in SynapseAI v1.12.**
Torch Autocast is becoming the default for managing mixed-precision runs.

- Fix autocast for BERT-like models 287 ANSHUMAN87
- Add support for autocast in gradient checkpointing 307 regisss


Improved text-generation example

- Added constrained beam search 281 vivekgoe
- Fix padding error 282 sywangyi
- Various improvements for faster checkpoint downloading 284 286 294 regisss
- Add deepspeed TP policy for llama 303 sywangyi
- Add token and model_revision args for the text-generation example 331 regisss


LoRA examples

Two new LoRA examples for [fine-tuning](https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling#peft) and [inference](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation#use-peft-models-for-generation).

- Add lora example for clm and text generation 305 sywangyi


LDM3D

New Stable Diffusion pipeline that enables to generate images and depth maps.

- Support for Ldm3d 304 estelleafl


Added support for Text Generation Inference (TGI)

[TGI](https://github.com/huggingface/text-generation-inference) is now supported on Gaudi.

- Add support for TGI on Gaudi 297 regisss


`GaudiGenerationConfig`

Transformers' `GenerationConfig` has been extended to be fully compatible with Gaudi. It adds two fields to better control generation with static shapes.

- Add GaudiGenerationConfig 293 regisss


Various fixes and improvements

- Fix generation sampling when using `repetition_penalty` 301 sywangyi
- Remove kv cache wa 302 ZhaiFeiyue
- Fix T5 inference performance regression 310 libinta
- Fix gptj HCCL issue occured in DDP 318 sywangyi
- Revert partially Enable/Optimize flan t5 xxl on deepspeed z3 320 hsubramony
- Modify flan-t5 deepspeed configuration 328 yeonsily
- Add commands for gptj and gptneox 325 ankurhabana
- Disable FusedRMSNorm for training 343 hsubramony
- Enable hpu rms fused kernel for t5 344 ZhaiFeiyue
- Remove two workarounds on esmfold 334 bzhu-habana

Page 9 of 22

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.