Optimum-habana

Latest version: v1.16.0

Safety actively analyzes 723685 Python packages for vulnerabilities to keep your Python projects secure.

Page 2 of 9

1.19

- Upgrade to SynapseAI 1.19 1667 regisss

FLUX

- FLUX with diffusers 0.31.0 1450 dsocek
- FLUX Fine-Tuning for Gaudi 1482 dsocek
- Flux Image-To-Image pipeline 1524 dsocek

New models

- Optimized inference of Cohere model on HPU 1329 XinyuYe-Intel
- Idefics2 1270 sywangyi
- Optimized inference of XGLM model on HPU 1323 XinyuYe-Intel
- Add mllama support 1419 sywangyi
- Enable paligemma model for image-to-text example 1407 kaixuanliu
- Enable Gemma2 Inference on Gaudi 1504 Luca-Calabria
- Minicpm enabling 1342 pi314ever
- Enable Falcon-mamba 1480 yuanwu2017
- Add support for Baichuan2 1479 xhaihao
- Enable DeepSeek-V2 1475 yao-matrix
- Add chatglm 1478 mengker33
- Falcon Model Support 1612 alekseyfa

Various model optimizations

- Enable flash attention for gemma 1454 atakaha
- Support loading 4 bit Qwen2 1476 mengniwang95
- Fixed Gemma FP8 flash_attention lower throughput issue 1510 kplau1128
- Disable default sdpa in Albert (22) 1517 astachowiczhabana
- Implement fused sdpa for wav2vec2 (18) 1520 astachowiczhabana
- Memory optimization for gpt_bitcode 1513 astachowiczhabana
- Support beam search with reuse_cache and bucket_internal 1472 Wei-Lin-Intel
- Add mixtral trl sft 1349 lkk12014402
- Enable tiiuae/falcon-11B-vlm in image_to_text example 1490 sywangyi
- Enable fusedsdpa kernel for vision part of mllama 1531 sywangyi
- Enable dynamic compile for mpi(training) 1509 chaojun-zhang
- Add DynamicMoE support for Mixtral 1511 kwisniewski98
- Implemented fusedSDPA for stable diffusion (36) 1545 astachowiczhabana
- Fix Accuracy Calculation Issue in GPT-NeoX 1591 yafshar

Sentence Transformers

- Update sentence transformer to v3.2.1 1470 ZhengHongming888

Textual Inversion XL

- Add textual inversion XL for Gaudi 868 dsocek

TIMM

- Enable pyTorch-IMage-Models (TIMM) with HPUs 1459 ZhengHongming888

Context Parallelism

- Adding support for Context Parallelism using Deepseed's DistributedAttention 1501 bhargaveede
- Move parallel_state.py to the distributed folder [a6ee7c2044e6ddf7d19ae3ad663149e51d6f89e7](https://github.com/huggingface/optimum-habana/commit/a6ee7c2044e6ddf7d19ae3ad663149e51d6f89e7) regisss

CI improvements

- Tests for text gen output text 1411 vidyasiv
- Add split runners to CI (2 devices per runner for fast tests) [72df37df46d1d2a2665c5d1be43b13704b7c8ada](https://github.com/huggingface/optimum-habana/commit/72df37df46d1d2a2665c5d1be43b13704b7c8ada) regisss
- Fix fast CI to work with split runners 1534 regisss
- Add Llama 3.1 ft to CI 1529 MohitIntel

Documentation

- Optimum-Habana docs re-org 1488 dsocek

Other

- Fix facebook/hf-seamless-m4t-medium crash 1433 sywangyi
- Fix bias update in scoped all reduce 1456 skavulya
- fea(pytests): Added skip for unsuported tests for mistral/mixtral 1462 imangohari1
- Remove deprecated Mixed precision flags 1471 vivekgoe
- Readme: replace tabs with spaces 1485 mgonchar
- Move fast tests to Gaudi2 1498 regisss
- Remove torch req from LM example 1491 astachowiczhabana
- Remove keep_input_mutations 1492 astachowiczhabana
- Fix trust_remote_code 1493 astachowiczhabana
- Upgrade ViT README with torch.compile 1494 astachowiczhabana
- Corrected Throughput measure for GaudiDDPMPipeline 1460 deepak-gowda-narayana
- [SW-196761] Add G3 in T5-L README 1523 astachowiczhabana
- Fix tuple object error 1354 SupreetSinghPalne
- Add warmup time and compile time log for the eval/prediction. 1489 jiminha
- Add support for MLPERF optimized pipeline from example 1465 ANSHUMAN87
- Add check_neural_compressor_min_version for 4 bit behavior 1500 xin3he
- Pass "lazy_mode" arg to GaudiLlamaModel GaudiTrainer 1515 astachowiczhabana
- Removed workaround for NaN bug causing graph break. 1516 astachowiczhabana
- text_generation: improve parameters check 1527 mgonchar
- transformers: fixed some typos 1528 mgonchar
- Makes the with_stack of the profiler changeable 1497 ranzhejiang
- Fix dtype issue with valid sequence length in torch.compile bs=1 1532 wszczurekhabana
- Migrate OH CLIP (roberta-clip) training to torch.compile 1507 chaojun-zhang
- test_text_generation: fix non-Gaudi2 case 1530 mgonchar
- text-generation: improve output printing 1486 mgonchar
- Text-generation, model set-up: torch.compile for attributes instead of models' types 1452 dsmertin
- Fix bridgetower example 1481 astachowiczhabana
- Migrate OH Wave2Vec-AC training to torch.compile - README update 1537 astachowiczhabana
- Migrate OH T5-large training to torch.compile 1506 chaojun-zhang
- trainer: fixed spelling 1538 mgonchar
- Create CI Eager/Lazy for Language Modeling 1448 Luca-Calabria
- Fixes for llava-next test failures in 1.19 1535 tthakkal
- Refactor Qwen2 Family 1541 Wei-Lin-Intel
- Add support for optimized SDXL pipeline 1519 sushildubey171
- Add the checkout parameters of falcon-mamba pytest 1540 yuanwu2017
- Avoid negative values in eval metrics 1533 deepak-gowda-narayana
- Fix lm_eval script for starcoder and gemma 1463 skavulya
- Add option to use bf16 in PT sdp (5) 1514 astachowiczhabana
- Fix tests.test_peft_inference failure 1543 sywangyi
- Update lm_eval version 1473 alexey-belyakov
- Fix bad import in Baichuan code 1547 regisss
- Restore performance in generate 1546 ugolowic
- Fix for llava models not generating text with test failures in 1.19 1548 tthakkal
- Refactor KV cache, Rope , reduce common code 1148 abhilash1910
- Adjust Qwen2-7B test case 1551 Wei-Lin-Intel
- [run_lm_eval.py] Fixed too many print dump json info 1553 FocusLuo
- Fix for single_card llama7b and falcon40b CI errors 1549 MohitIntel
- Apply --sdp_on_bf16 to image-to-text examples 1557 schoi-habana
- Fix accuracy regression in Gemma 1556 skavulya
- Fix FusedSDPA wrapper from TransformerEngine 1562 pbielak
- Run albert-xxlarge-v1 CI as torch.compile mode 1563 yeonsily
- Update README commands for the models to use --sdp_on_bf16 1566 yeonsily
- Minicpm patch 1567 pi314ever
- Updated gemma_2b_it CI 1561 Luca-Calabria
- Fixed Adalora Test for OH 1.15 1564 npiroozan
- Fixed LORACP Test for OH 1.15 1568 npiroozan
- Fix prefix llama ci failure 1570 sywangyi
- Fix mllama test 1569 sywangyi
- Fix lazy_mode assignment 1558 vidyasiv
- Generation utils update (minor) 1468 yafshar
- Style: removed tabs 1577 mgonchar
- Enable num_return_sequences in beam search 1536 mengker33
- gpt_bigcode: added internal bucketing fix 1526 mgonchar
- Update the Gaudi trainer with transformers 4.45.2 1398 yafshar
- Revert "add check_neural_compressor_min_version for 4 bit behavior" 1578 xin3he
- Revert PR 1473 1582 regisss
- Fixed spelling 1576 mgonchar
- Update docs for baichuan2 training 1586 xhaihao
- Add WA flag for falcon-180b to resolve text-gen critical reset error during tests 1590 hchauhan123
- Update transformers tests generation util v4.45.2 1441 malkomes
- Limit position embeddings in inference 1598 bhargaveede
- Verify model output is provided when check_output is enabled 1597 vidyasiv
- Update README.md 1595 skaulintel
- Fix scikit-learn to 1.5.2 to fix f1 evaluation crash in 1.6.0 1596 sywangyi
- Update language-modeling README file 1599 vivekgoe
- Revert common KVCache not to check token_idx 1594 jiminha
- Revert LlamaKVCache due to memory increase 1605 jiminha
- Replace the UNET custom attention processors 1608 yafshar
- Fix run_generation test commands for TRL out usage example 1621 shepark
- Update sdp_on_bf16 option for ST example 1615 ZhengHongming888
- Update save lora weights for diffusers with text_encoder_2 layers 1626 skavulya
- Fix save_lora_weights in pipeline_utils.py 1643 regisss
- Check rope_scaling attr 1609 jiminha
- Skip certain tests for G1 with empty param list 1613 hsubramony
- Revert "Update transformers tests generation util v4.45.2 (1441)" 1614 yeonsily
- Audio classification readme update 1604 hsubramony
- Fix readme cmds for clip-roberta 1603 hsubramony
- Add arbitrary scales 1625 jiminha
- Modify Qwen2 TRL command to avoid OOM. 1630 jiminha
- Fix distributed issue for ST Trainer 1649 ZhengHongming888
- Fix distributed issue for timm 1653 ZhengHongming888
- Refactor mixtral moe block. 1635 lkk12014402
- Speech-recognition: downgrade datasets version 1646 hsubramony
- Add sdp_on_bf16 to controlnet 1631 skaulintel
- Quick fix for quantization/custom op list loading 1657 dsocek
- Fix bug for GaudiMixtralAttentionLongSequence forward 1650 kaixuanliu

1.18

- Upgrade to SynapseAI 1.18.0 1418 regisss

Qwen2-MoE

- Added Qwen2-MoE model, optimizing its performance on Gaudi 1316 gyou2021

Text-to-video generation

- Enabling Text to Video Diffusion Model Generation 1109 pi314ever
- Porting Stable Video Diffusion ControNet to HPU 1037 wenbinc-Bin

Depth-to-image generation

- Depth to Image Generation 1175 pi314ever

Model optimizations

- Enable FusedSDPA for Mpt 1101 Jianhong-Zhang
- Mixtral fp8 1269 imangohari1
- Prevent Graph break in Llama when using flash attention 1301 pramodkumar-habanalabs
- Boost SDXL speed with initialized schedule step reset 1284 dsocek
- Improve MPT fp8 1256 atakaha
- Add Whisper static generation 1275 Spycsh
- Gemma: enabled HPU Graphs and Flash Attention 1173 dsmertin
- Recommend jemalloc for gpt-neox-20b 8x 1350 hsubramony
- Optimized inference of GPT-NEO model on HPU 1319 XinyuYe-Intel
- Fix graph breaks for BART in torch.compile mode. 1379 astachowiczhabana
- Gpt_bigcode: added internal_bucketing support 1218 mgonchar
- refine bucket_internal for mpt 1194 Jing1Ling
- Qwen finetuning bucketing 1130 ssarkar2
- Enable FusedSDPA fp8 in Llama FT 1388 pbielak
- Added gemma specific fp8 quantization file 1445 yeonsily

Intel Neural Compressor

- Enable INC for llava models and change softmax to use torch.nn.functional.softmax as its supported module by INC 1325 tthakkal
- Load INC GPTQ checkpoint & rename params 1364 HolyFalafel
- Fix load INC load weights compile error due to Transformer 4.45 upgrade. 1421 jiminha

Vera/LN-tuning

- Vera/ln_tuning add and test case add 1294 sywangyi

Other

- Add callable workflow to post comments when code quality check failed 1263 regisss
- Fix failed code quality check comment workflow 1264 regisss
- Accelerate Diffusers CI 1265 regisss
- Add profiler to SD3 1267 atakaha
- Fix profiling step with device finish execution for text-generation 1283 libinta
- Update FusedSDPA calling method as Gaudi documentation 1285 yeonsily
- Switch failed code quality check comment to workflow_run 1297 regisss
- Potential fix for the failed code quality check comment workflow 1299 regisss
- Fix text-generation example lm_eval evaluation 1308 changwangss
- Add section to README about Transformers development branch 1307 regisss
- Fix eager mode in run_generation by removing graph logs 1231 Vasud-ha
- Fix bug when running google/paligemma-3b-mix-224 1279 kaixuanliu
- Use native checkpointing under compile mode 1313 xinyu-intel
- fixed fused_qkv object AttributeError due to 'LlamaConfig' 1203 rkumar2patel
- Image to Image Generation Enabling 1196 pi314ever
- Diffusers timing 1277 imangohari1
- Fix eos issue in finetune/generation 1253 sywangyi
- Update CI, tests and examples 1315 regisss
- Fix Sentence Transformer HPU graphs for training with PEFT model 1320 nngokhale
- Fix ZeroDivisionError in constrained beam search with static shapes 1317 skavulya
- Update esmfold model not to use param_buffer_assignment 1324 jiminha
- Falcon inference crash fix for falcon-40b model 1161 yeonsily
- Add --use_kv_cache to image-to-text pipeline 1292 KimBioInfoStudio
- Trl upgrade 1245 sywangyi
- Fix uint4 url typo. 1340 kding1
- Use eager attention for wav2vec2 1333 skaulintel
- Add _reorder_cache back to Llama for HPU 1233 jiminha
- SDXL CI script throughput 1296 imangohari1
- Add image so that transformers tests can run 1338 skaulintel
- Fixes the no attribute error with the falcon multicard test 1344 mounikamandava
- Add profiler to sdxl mlperf pipeline 1339 Jianhong-Zhang
- Fix decoder only generation 948 tjs-intel
- Upgrade gradient chekpointing 1347 yafshar
- Run_generation example: fixed graph compilation statistics reporting 1352 mgonchar
- Fix deepseeed crash with Sentence Transformer Trainer 1328 nngokhale
- fea(ci): reduced slow test_diffusers timing. minor fixes 1330 imangohari1
- Flash attn args for GaudiGemmaForCausalLM 1356 kkoryun
- Transformer models generation supports user-provided input embeddings 1276 zongwave
- Fixed the expected values after for img2img slice 1332 imangohari1
- Gpt_big_code: make flash attention impl quantization friendly 1282 mgonchar
- Fix OOM when inference with llama-3.1-70b 1302 harborn
- Fix the conditional 1362 yafshar
- Revert "use native checkpointing under compile mode" 1365 xinyu-intel
- Remove repetitive pip install commands 1367 MohitIntel
- Minor UX enhancement 1373 MohitIntel
- Fix bug when running image-to-text example 1371 kaixuanliu
- Gpt_bigcode: fixed wrong indentation 1376 mgonchar
- Support for transformers without self.model to torch.compile 1380 astachowiczhabana
- Only pass the use_kv_cache True to generator 1366 yafshar
- Clean up the code and remove unnecessary class 1382 yafshar
- Add the diffusers examples of inference Tech 1244 yuanwu2017
- Enhance transformers test suite in Optimum-habana-4.43.4 Auto pr 07654de 1387 rkumar2patel
- Enhance transformers test suite in Optimum-habana-4.43.4 (auto PR 8926a4b) 1386 rkumar2patel
- Add README.md for Sentence transformer examples with HPU device 1355 ZhengHongming888
- Change Falcon/GPT-Neox rotary embedding function to use seq_len for 1368 yeonsily
- Enhance Optimum-habana as per transformers-4.43.4 1381 rkumar2patel
- CI fix - Install stable-diffusion reqs 1389 vidyasiv
- Fix error caused by uninitialized attn_weights 1391 hsubramony
- Replace flash attention flag 1393 skaulintel
- Fix DeepSpeed CI on Gaudi2 1395 regisss
- Truncate the cached max seq len 1394 astachowiczhabana
- Fix gpt-neox training accuracy issue. 1397 yeonsily
- Simplify HQT config files 1219 Tiefen-boop
- unify_measurements.py script support to unify PCQ 70B 8x 1322 Yantom1
- Add misc. training args 1346 SanityRemnants
- Add quantization config for low bs case 1377 ulivne
- Remove HQT from OHF 1257 Yantom1
- Valid sequence length for sdpa 1183 ssarkar2
- Multiple fixes (dynamo graph break, qwen-moe, multicard) 1410 ssarkar2
- Change the image path for transformers tests back to the correct location 1401 skaulintel
- Fix Gaudi2 regression tests 1403 regisss
- Reverting some of transformer pytest funcs/values 1399 imangohari1
- Fix StarCoder2 inference 1405 regisss
- Change the order for test_diffusers 1406 hsubramony
- Fix llama model text generation error 1402 zongwave
- Datasets downgrade version to 2.21.0 1413 hsubramony
- Update ci sentence_transformer.sh 1424 ZhengHongming888
- Update language-modeling README.md, add trust_remote_code for flan-t5-xl 1422 hsubramony
- Update unify_measurements.py support info 1425 shepark
- Fix GPT_neox incorrect output with batch query 1358 Jianhong-Zhang
- Fix text-to-image example 1429 regisss
- Add flag to run inference with partial dataset 1420 pramodkumar-habanalabs
- Add peft generation example 1427 sywangyi
- Added missing allocate_kv_cache() call in CausalLM class 1431 yeonsily
- Fix merge error and update text-to-speech readme 1436 hsubramony
- Fix OOM error for code llama 1437 jiminha
- Fix error on 4bit checkpoint load with run_lm_eval on TF4.45.2 1439 jiminha
- GPT2 torch.compile fix 1434 dsmertin
- Update text-gen README.md to add auto-gptq fork install steps 1442 hsubramony
- Fix scoped linear all-reduce for starcoder model 1432 skavulya
- Fixed recursion error in SentenceTransformer 1428 yafshar
- Fix Llama 3.1 generation 1444 regisss
- Remove cache folder from image data folder 1446 shepark

1.17

- Upgrade SynapseAI version to 1.17.0 1217

1.16

- Upgrade to SynapseAI v1.16 1043 regisss

1.15

The codebase is fully validated for the latest version of Habana SDK, SynapseAI v1.15.0.

- Upgrade to SynapseAI 1.15.0 831 regisss

SDXL fine-tuning

- SDXL fine tuning 667 dsocek
- Mediapipe sdxl 787 ssarkar2

Whisper

- Support speech recognition with whisper models and seq2seq 704 emascarenhas

Phi

- Enable phi series models 732 lkk12014402

ControlNet

- Controlnet training 650 vidyasiv

1.15.0 Page 2 of 9

Releases

Has known vulnerabilities

Previous Next

Optimum-habana

Page 2 of 9

1.19

1.18

1.17

1.16

1.15

1.15.0

Page 2 of 9

Links

Releases