- Upgrade to SynapseAI 1.19 1667 regisss
FLUX
- FLUX with diffusers 0.31.0 1450 dsocek
- FLUX Fine-Tuning for Gaudi 1482 dsocek
- Flux Image-To-Image pipeline 1524 dsocek
New models
- Optimized inference of Cohere model on HPU 1329 XinyuYe-Intel
- Idefics2 1270 sywangyi
- Optimized inference of XGLM model on HPU 1323 XinyuYe-Intel
- Add mllama support 1419 sywangyi
- Enable paligemma model for image-to-text example 1407 kaixuanliu
- Enable Gemma2 Inference on Gaudi 1504 Luca-Calabria
- Minicpm enabling 1342 pi314ever
- Enable Falcon-mamba 1480 yuanwu2017
- Add support for Baichuan2 1479 xhaihao
- Enable DeepSeek-V2 1475 yao-matrix
- Add chatglm 1478 mengker33
- Falcon Model Support 1612 alekseyfa
Various model optimizations
- Enable flash attention for gemma 1454 atakaha
- Support loading 4 bit Qwen2 1476 mengniwang95
- Fixed Gemma FP8 flash_attention lower throughput issue 1510 kplau1128
- Disable default sdpa in Albert (22) 1517 astachowiczhabana
- Implement fused sdpa for wav2vec2 (18) 1520 astachowiczhabana
- Memory optimization for gpt_bitcode 1513 astachowiczhabana
- Support beam search with reuse_cache and bucket_internal 1472 Wei-Lin-Intel
- Add mixtral trl sft 1349 lkk12014402
- Enable tiiuae/falcon-11B-vlm in image_to_text example 1490 sywangyi
- Enable fusedsdpa kernel for vision part of mllama 1531 sywangyi
- Enable dynamic compile for mpi(training) 1509 chaojun-zhang
- Add DynamicMoE support for Mixtral 1511 kwisniewski98
- Implemented fusedSDPA for stable diffusion (36) 1545 astachowiczhabana
- Fix Accuracy Calculation Issue in GPT-NeoX 1591 yafshar
Sentence Transformers
- Update sentence transformer to v3.2.1 1470 ZhengHongming888
Textual Inversion XL
- Add textual inversion XL for Gaudi 868 dsocek
TIMM
- Enable pyTorch-IMage-Models (TIMM) with HPUs 1459 ZhengHongming888
Context Parallelism
- Adding support for Context Parallelism using Deepseed's DistributedAttention 1501 bhargaveede
- Move parallel_state.py to the distributed folder [a6ee7c2044e6ddf7d19ae3ad663149e51d6f89e7](https://github.com/huggingface/optimum-habana/commit/a6ee7c2044e6ddf7d19ae3ad663149e51d6f89e7) regisss
CI improvements
- Tests for text gen output text 1411 vidyasiv
- Add split runners to CI (2 devices per runner for fast tests) [72df37df46d1d2a2665c5d1be43b13704b7c8ada](https://github.com/huggingface/optimum-habana/commit/72df37df46d1d2a2665c5d1be43b13704b7c8ada) regisss
- Fix fast CI to work with split runners 1534 regisss
- Add Llama 3.1 ft to CI 1529 MohitIntel
Documentation
- Optimum-Habana docs re-org 1488 dsocek
Other
- Fix facebook/hf-seamless-m4t-medium crash 1433 sywangyi
- Fix bias update in scoped all reduce 1456 skavulya
- fea(pytests): Added skip for unsuported tests for mistral/mixtral 1462 imangohari1
- Remove deprecated Mixed precision flags 1471 vivekgoe
- Readme: replace tabs with spaces 1485 mgonchar
- Move fast tests to Gaudi2 1498 regisss
- Remove torch req from LM example 1491 astachowiczhabana
- Remove keep_input_mutations 1492 astachowiczhabana
- Fix trust_remote_code 1493 astachowiczhabana
- Upgrade ViT README with torch.compile 1494 astachowiczhabana
- Corrected Throughput measure for GaudiDDPMPipeline 1460 deepak-gowda-narayana
- [SW-196761] Add G3 in T5-L README 1523 astachowiczhabana
- Fix tuple object error 1354 SupreetSinghPalne
- Add warmup time and compile time log for the eval/prediction. 1489 jiminha
- Add support for MLPERF optimized pipeline from example 1465 ANSHUMAN87
- Add check_neural_compressor_min_version for 4 bit behavior 1500 xin3he
- Pass "lazy_mode" arg to GaudiLlamaModel GaudiTrainer 1515 astachowiczhabana
- Removed workaround for NaN bug causing graph break. 1516 astachowiczhabana
- text_generation: improve parameters check 1527 mgonchar
- transformers: fixed some typos 1528 mgonchar
- Makes the with_stack of the profiler changeable 1497 ranzhejiang
- Fix dtype issue with valid sequence length in torch.compile bs=1 1532 wszczurekhabana
- Migrate OH CLIP (roberta-clip) training to torch.compile 1507 chaojun-zhang
- test_text_generation: fix non-Gaudi2 case 1530 mgonchar
- text-generation: improve output printing 1486 mgonchar
- Text-generation, model set-up: torch.compile for attributes instead of models' types 1452 dsmertin
- Fix bridgetower example 1481 astachowiczhabana
- Migrate OH Wave2Vec-AC training to torch.compile - README update 1537 astachowiczhabana
- Migrate OH T5-large training to torch.compile 1506 chaojun-zhang
- trainer: fixed spelling 1538 mgonchar
- Create CI Eager/Lazy for Language Modeling 1448 Luca-Calabria
- Fixes for llava-next test failures in 1.19 1535 tthakkal
- Refactor Qwen2 Family 1541 Wei-Lin-Intel
- Add support for optimized SDXL pipeline 1519 sushildubey171
- Add the checkout parameters of falcon-mamba pytest 1540 yuanwu2017
- Avoid negative values in eval metrics 1533 deepak-gowda-narayana
- Fix lm_eval script for starcoder and gemma 1463 skavulya
- Add option to use bf16 in PT sdp (5) 1514 astachowiczhabana
- Fix tests.test_peft_inference failure 1543 sywangyi
- Update lm_eval version 1473 alexey-belyakov
- Fix bad import in Baichuan code 1547 regisss
- Restore performance in generate 1546 ugolowic
- Fix for llava models not generating text with test failures in 1.19 1548 tthakkal
- Refactor KV cache, Rope , reduce common code 1148 abhilash1910
- Adjust Qwen2-7B test case 1551 Wei-Lin-Intel
- [run_lm_eval.py] Fixed too many print dump json info 1553 FocusLuo
- Fix for single_card llama7b and falcon40b CI errors 1549 MohitIntel
- Apply --sdp_on_bf16 to image-to-text examples 1557 schoi-habana
- Fix accuracy regression in Gemma 1556 skavulya
- Fix FusedSDPA wrapper from TransformerEngine 1562 pbielak
- Run albert-xxlarge-v1 CI as torch.compile mode 1563 yeonsily
- Update README commands for the models to use --sdp_on_bf16 1566 yeonsily
- Minicpm patch 1567 pi314ever
- Updated gemma_2b_it CI 1561 Luca-Calabria
- Fixed Adalora Test for OH 1.15 1564 npiroozan
- Fixed LORACP Test for OH 1.15 1568 npiroozan
- Fix prefix llama ci failure 1570 sywangyi
- Fix mllama test 1569 sywangyi
- Fix lazy_mode assignment 1558 vidyasiv
- Generation utils update (minor) 1468 yafshar
- Style: removed tabs 1577 mgonchar
- Enable num_return_sequences in beam search 1536 mengker33
- gpt_bigcode: added internal bucketing fix 1526 mgonchar
- Update the Gaudi trainer with transformers 4.45.2 1398 yafshar
- Revert "add check_neural_compressor_min_version for 4 bit behavior" 1578 xin3he
- Revert PR 1473 1582 regisss
- Fixed spelling 1576 mgonchar
- Update docs for baichuan2 training 1586 xhaihao
- Add WA flag for falcon-180b to resolve text-gen critical reset error during tests 1590 hchauhan123
- Update transformers tests generation util v4.45.2 1441 malkomes
- Limit position embeddings in inference 1598 bhargaveede
- Verify model output is provided when check_output is enabled 1597 vidyasiv
- Update README.md 1595 skaulintel
- Fix scikit-learn to 1.5.2 to fix f1 evaluation crash in 1.6.0 1596 sywangyi
- Update language-modeling README file 1599 vivekgoe
- Revert common KVCache not to check token_idx 1594 jiminha
- Revert LlamaKVCache due to memory increase 1605 jiminha
- Replace the UNET custom attention processors 1608 yafshar
- Fix run_generation test commands for TRL out usage example 1621 shepark
- Update sdp_on_bf16 option for ST example 1615 ZhengHongming888
- Update save lora weights for diffusers with text_encoder_2 layers 1626 skavulya
- Fix save_lora_weights in pipeline_utils.py 1643 regisss
- Check rope_scaling attr 1609 jiminha
- Skip certain tests for G1 with empty param list 1613 hsubramony
- Revert "Update transformers tests generation util v4.45.2 (1441)" 1614 yeonsily
- Audio classification readme update 1604 hsubramony
- Fix readme cmds for clip-roberta 1603 hsubramony
- Add arbitrary scales 1625 jiminha
- Modify Qwen2 TRL command to avoid OOM. 1630 jiminha
- Fix distributed issue for ST Trainer 1649 ZhengHongming888
- Fix distributed issue for timm 1653 ZhengHongming888
- Refactor mixtral moe block. 1635 lkk12014402
- Speech-recognition: downgrade datasets version 1646 hsubramony
- Add sdp_on_bf16 to controlnet 1631 skaulintel
- Quick fix for quantization/custom op list loading 1657 dsocek
- Fix bug for GaudiMixtralAttentionLongSequence forward 1650 kaixuanliu