Optimum-habana

Latest version: v1.14.1

Safety actively analyzes 687918 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 9

1.18

- Upgrade to SynapseAI 1.18.0 1418 regisss


Qwen2-MoE

- Added Qwen2-MoE model, optimizing its performance on Gaudi 1316 gyou2021


Text-to-video generation

- Enabling Text to Video Diffusion Model Generation 1109 pi314ever
- Porting Stable Video Diffusion ControNet to HPU 1037 wenbinc-Bin


Depth-to-image generation

- Depth to Image Generation 1175 pi314ever


Model optimizations

- Enable FusedSDPA for Mpt 1101 Jianhong-Zhang
- Mixtral fp8 1269 imangohari1
- Prevent Graph break in Llama when using flash attention 1301 pramodkumar-habanalabs
- Boost SDXL speed with initialized schedule step reset 1284 dsocek
- Improve MPT fp8 1256 atakaha
- Add Whisper static generation 1275 Spycsh
- Gemma: enabled HPU Graphs and Flash Attention 1173 dsmertin
- Recommend jemalloc for gpt-neox-20b 8x 1350 hsubramony
- Optimized inference of GPT-NEO model on HPU 1319 XinyuYe-Intel
- Fix graph breaks for BART in torch.compile mode. 1379 astachowiczhabana
- Gpt_bigcode: added internal_bucketing support 1218 mgonchar
- refine bucket_internal for mpt 1194 Jing1Ling
- Qwen finetuning bucketing 1130 ssarkar2
- Enable FusedSDPA fp8 in Llama FT 1388 pbielak
- Added gemma specific fp8 quantization file 1445 yeonsily


Intel Neural Compressor

- Enable INC for llava models and change softmax to use torch.nn.functional.softmax as its supported module by INC 1325 tthakkal
- Load INC GPTQ checkpoint & rename params 1364 HolyFalafel
- Fix load INC load weights compile error due to Transformer 4.45 upgrade. 1421 jiminha


Vera/LN-tuning

- Vera/ln_tuning add and test case add 1294 sywangyi


Other

- Add callable workflow to post comments when code quality check failed 1263 regisss
- Fix failed code quality check comment workflow 1264 regisss
- Accelerate Diffusers CI 1265 regisss
- Add profiler to SD3 1267 atakaha
- Fix profiling step with device finish execution for text-generation 1283 libinta
- Update FusedSDPA calling method as Gaudi documentation 1285 yeonsily
- Switch failed code quality check comment to workflow_run 1297 regisss
- Potential fix for the failed code quality check comment workflow 1299 regisss
- Fix text-generation example lm_eval evaluation 1308 changwangss
- Add section to README about Transformers development branch 1307 regisss
- Fix eager mode in run_generation by removing graph logs 1231 Vasud-ha
- Fix bug when running google/paligemma-3b-mix-224 1279 kaixuanliu
- Use native checkpointing under compile mode 1313 xinyu-intel
- fixed fused_qkv object AttributeError due to 'LlamaConfig' 1203 rkumar2patel
- Image to Image Generation Enabling 1196 pi314ever
- Diffusers timing 1277 imangohari1
- Fix eos issue in finetune/generation 1253 sywangyi
- Update CI, tests and examples 1315 regisss
- Fix Sentence Transformer HPU graphs for training with PEFT model 1320 nngokhale
- Fix ZeroDivisionError in constrained beam search with static shapes 1317 skavulya
- Update esmfold model not to use param_buffer_assignment 1324 jiminha
- Falcon inference crash fix for falcon-40b model 1161 yeonsily
- Add --use_kv_cache to image-to-text pipeline 1292 KimBioInfoStudio
- Trl upgrade 1245 sywangyi
- Fix uint4 url typo. 1340 kding1
- Use eager attention for wav2vec2 1333 skaulintel
- Add _reorder_cache back to Llama for HPU 1233 jiminha
- SDXL CI script throughput 1296 imangohari1
- Add image so that transformers tests can run 1338 skaulintel
- Fixes the no attribute error with the falcon multicard test 1344 mounikamandava
- Add profiler to sdxl mlperf pipeline 1339 Jianhong-Zhang
- Fix decoder only generation 948 tjs-intel
- Upgrade gradient chekpointing 1347 yafshar
- Run_generation example: fixed graph compilation statistics reporting 1352 mgonchar
- Fix deepseeed crash with Sentence Transformer Trainer 1328 nngokhale
- fea(ci): reduced slow test_diffusers timing. minor fixes 1330 imangohari1
- Flash attn args for GaudiGemmaForCausalLM 1356 kkoryun
- Transformer models generation supports user-provided input embeddings 1276 zongwave
- Fixed the expected values after for img2img slice 1332 imangohari1
- Gpt_big_code: make flash attention impl quantization friendly 1282 mgonchar
- Fix OOM when inference with llama-3.1-70b 1302 harborn
- Fix the conditional 1362 yafshar
- Revert "use native checkpointing under compile mode" 1365 xinyu-intel
- Remove repetitive pip install commands 1367 MohitIntel
- Minor UX enhancement 1373 MohitIntel
- Fix bug when running image-to-text example 1371 kaixuanliu
- Gpt_bigcode: fixed wrong indentation 1376 mgonchar
- Support for transformers without self.model to torch.compile 1380 astachowiczhabana
- Only pass the use_kv_cache True to generator 1366 yafshar
- Clean up the code and remove unnecessary class 1382 yafshar
- Add the diffusers examples of inference Tech 1244 yuanwu2017
- Enhance transformers test suite in Optimum-habana-4.43.4 Auto pr 07654de 1387 rkumar2patel
- Enhance transformers test suite in Optimum-habana-4.43.4 (auto PR 8926a4b) 1386 rkumar2patel
- Add README.md for Sentence transformer examples with HPU device 1355 ZhengHongming888
- Change Falcon/GPT-Neox rotary embedding function to use seq_len for 1368 yeonsily
- Enhance Optimum-habana as per transformers-4.43.4 1381 rkumar2patel
- CI fix - Install stable-diffusion reqs 1389 vidyasiv
- Fix error caused by uninitialized attn_weights 1391 hsubramony
- Replace flash attention flag 1393 skaulintel
- Fix DeepSpeed CI on Gaudi2 1395 regisss
- Truncate the cached max seq len 1394 astachowiczhabana
- Fix gpt-neox training accuracy issue. 1397 yeonsily
- Simplify HQT config files 1219 Tiefen-boop
- unify_measurements.py script support to unify PCQ 70B 8x 1322 Yantom1
- Add misc. training args 1346 SanityRemnants
- Add quantization config for low bs case 1377 ulivne
- Remove HQT from OHF 1257 Yantom1
- Valid sequence length for sdpa 1183 ssarkar2
- Multiple fixes (dynamo graph break, qwen-moe, multicard) 1410 ssarkar2
- Change the image path for transformers tests back to the correct location 1401 skaulintel
- Fix Gaudi2 regression tests 1403 regisss
- Reverting some of transformer pytest funcs/values 1399 imangohari1
- Fix StarCoder2 inference 1405 regisss
- Change the order for test_diffusers 1406 hsubramony
- Fix llama model text generation error 1402 zongwave
- Datasets downgrade version to 2.21.0 1413 hsubramony
- Update ci sentence_transformer.sh 1424 ZhengHongming888
- Update language-modeling README.md, add trust_remote_code for flan-t5-xl 1422 hsubramony
- Update unify_measurements.py support info 1425 shepark
- Fix GPT_neox incorrect output with batch query 1358 Jianhong-Zhang
- Fix text-to-image example 1429 regisss
- Add flag to run inference with partial dataset 1420 pramodkumar-habanalabs
- Add peft generation example 1427 sywangyi
- Added missing allocate_kv_cache() call in CausalLM class 1431 yeonsily
- Fix merge error and update text-to-speech readme 1436 hsubramony
- Fix OOM error for code llama 1437 jiminha
- Fix error on 4bit checkpoint load with run_lm_eval on TF4.45.2 1439 jiminha
- GPT2 torch.compile fix 1434 dsmertin
- Update text-gen README.md to add auto-gptq fork install steps 1442 hsubramony
- Fix scoped linear all-reduce for starcoder model 1432 skavulya
- Fixed recursion error in SentenceTransformer 1428 yafshar
- Fix Llama 3.1 generation 1444 regisss
- Remove cache folder from image data folder 1446 shepark

1.17

- Upgrade SynapseAI version to 1.17.0 1217

1.16

- Upgrade to SynapseAI v1.16 1043 regisss

1.15

The codebase is fully validated for the latest version of Habana SDK, SynapseAI v1.15.0.

- Upgrade to SynapseAI 1.15.0 831 regisss


SDXL fine-tuning

- SDXL fine tuning 667 dsocek
- Mediapipe sdxl 787 ssarkar2


Whisper

- Support speech recognition with whisper models and seq2seq 704 emascarenhas


Phi

- Enable phi series models 732 lkk12014402


ControlNet

- Controlnet training 650 vidyasiv

1.14.1

- Enable DeepSpeed for image-to-text example 1455 schoi-habana
- Fix bug when loading 4bit checkpoint quantized in INC 1447 xin3he
- Fixes 'Tokenizer does not have padding token' introduced by 1444 for Llama3.1 1457 MohitIntel

**Full Changelog**: https://github.com/huggingface/optimum-habana/compare/v1.14.0...v1.14.1

1.14

The codebase is fully validated for the latest version of Habana SDK, SynapseAI v1.14.0.

- Upgrade to SynapseAI 1.14 https://github.com/huggingface/optimum-habana/pull/664 regisss


Stable Diffusion XL

SDXL is now supported and optimized for Gaudi.

- Stable Diffusion XL for Gaudi 619 dsocek
- Update for SDXL Turbo support 634 atakaha


Textual inversion fine-tuning

An example of textual-inversion fine-tuning has been added.

- Add Textual Inversion fine-tuning script 243 regisss


TRL

The 🤗 [TRL library](https://github.com/huggingface/trl) is now supported on Gaudi for performing DPO and SFT.

- Add DPO and SFT of TRL support in Gaudi and example 601
- Restructure example/trl/stack_llama_2 for generic DPO 635 libinta
- Add DPO of TRL in README.md 652 libinta
- Add seed in DPO for reproduce the training result 646 sywangyi


Full bf16 evaluation

Full bf16 evaluation inside the trainer can now be performed like in Transformers.

- Adding support for bf16_full_eval 610 bhargaveede


Text-generation pipeline

A text-generation pipeline fully optimized for Gaudi has been added.

- Text-Generation Pipeline Example 526 sjagtap1803


Model optimizations

- Enhances llama performance by removing the 'cast_f32_to_bf16' operation 564 kalyanjk
- Refactoring LLama Attention and mlp layers 589 bgoldberg-habana
- Support for FlashAttention in Llama2 584 wszczurekhabana
- Integrate Habana flash attention to Llama2-70B finetune 596 mandy-li
- Enabling T5ForConditionalGeneration Inference using static shapes 425 bhargaveede
- Avoid falcon perf drop from PR607 when BS=1 schoi-habana
- Enable fused rmsnorm in bf16 for llama 621 puneeshkhanna
- Flash attention enhancement of repeatKV 626 puneeshkhanna
- Update repeat KV llama logic for better TP-4 performance 639 puneeshkhanna
- Falcon changes for v1.14.0 release 654 schoi-habana


TGI

TGI on Gaudi has been moved to a dedicated repo: https://github.com/huggingface/tgi-gaudi

- Update tokenizer for tgi 572 hsubramony
- Remove redundant requirements 575 hsubramony
- Change next_token_chooser to HeterogeneousNextTokenChooser for TGI 574 yeonsily
- Remove TGI folder from Optimum Habana 597 regisss


Various fixes

- Fix messed up README for llama2-70b 571 mandy-li
- Fix Diffusers tests 570 ssarkar2
- Fix fp8 command in text-generation README 586 regisss
- Fix wav2vec inference bug 588 skaulintel
- Fix hash_with_views error 587 bgoldberg-habana
- Add dataset disposal of b-mc2/sql-create-context for codegen and fix zero3 lora save issue 552 sywangyi
- Fix gptj training issue 594 BaihuiJin
- Fix DataLoaderDispatcher issue in Gaudi 600 sywangyi
- Fix for Falcon error from PR 587 608 schoi-habana
- Falcon graph compilation error fix for when bs>1 607 regisss
- Fix crash if gaudi_config is not passed to GaudiTrainer 613 sywangyi
- Fix flash attention output for llama for padded batched inputs 623 puneeshkhanna
- Fix backward error in DDP when running reward model finetune in RLHF 507 sywangyi
- Fix dpo graph compile error in evaluation 630 sywangyi
- Fix error in run_image_classification.py 631 regisss
- Fix RLHF llama rewarding modeling backward issue 612 sywangyi
- Fix SD example so that custom bf16 ops can be used 642 regisss
- Fix SD2 test 647 regisss
- Fix typo in README 656 yeonsily
- Fix error in PR654 661 schoi-habana
- Fix compile error for torch_cmpile for llama 662 jiminha
- Fix SDXL test 666 regisss


Others

- Remove red crosses in model table 577 regisss
- Misc changes for transformers tests 581 ankurneog
- Remove delete_doc_comment workflows 582 regisss
- Pin PEFT for the languge-modeling example 591 regisss
- Remove workarounds to have causal_mask in uint8 for GPT2, GPT-J and CodeGen 592 regisss
- Change Synapse validated version in README 603 regisss
- Dyn prompt afterrefactor 543 ssarkar2
- In peft, only the trainable parameters need to be saved 576 sywangyi
- Add inheritance in Diffusers pipelines 611 regisss
- Update generation config to enable flash attention for inference 609 puneeshkhanna
- Remove setting of PT_HPU_LAZY_MODE=2 in training_args.py 625 vivekgoe
- Remove hpu:X notation untill fully supported by bridge 637 hsubramony
- Add use_flash_attention to Llama2-70B finetuning command in README 640 mandy-li
- Enable master_port selecting for DeepSpeed and MPI 641 yangulei
- Enabling Graphs in Wav2Vec AC training 622 bhargaveede
- Add changes to support FSDP 598 vivekgoe
- Run Llama2 with torch.compile on Gaudi2 616 kausikmaiti
- Hqt 648 bgoldberg-habana

Page 2 of 9

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.