Optimum-neuron

Latest version: v0.0.22

Safety actively analyzes 631215 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 4

0.9998538494110107

clf = pipeline("question-answering")
clf({"context": "This is a sample context", "question": "What is the context here?"})
{'score': 0.4972594678401947, 'start': 8, 'end': 16, 'answer': 'a sample'}

Or with precompiled models as follows:

python
from transformers import AutoTokenizer
from optimum.neuron import NeuronModelForQuestionAnswering, pipeline

tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")

Loading the PyTorch checkpoint and converting to the neuron format by providing export=True
model = NeuronModelForQuestionAnswering.from_pretrained(
"deepset/roberta-base-squad2",
export=True
)

neuron_qa = pipeline("question-answering", model=model, tokenizer=tokenizer)
question = "What's my name?"
context = "My name is Philipp and I live in Nuremberg."

pred = neuron_qa(question=question, context=context)

*Relevant PR: 107*

Cache repo fix

The cache repo system was broken starting from Neuron 2.11.
*This release fixes that, the relevant PR is 119.*

0.0.22

What's Changed

Training
* Integrate new API for saving and loading with `neuronx_distributed` by michaelbenayoun in https://github.com/huggingface/optimum-neuron/pull/560

Inference

* Add support for Mixtral by dacorvo in https://github.com/huggingface/optimum-neuron/pull/569
* Improve Llama models performance by dacorvo in https://github.com/huggingface/optimum-neuron/pull/587
* Make Stable Diffusion pipelines compatible with compel by JingyaHuang and neo in https://github.com/huggingface/optimum-neuron/pull/581 (with tests inspired by the snippets sent from Suprhimp)
* Add `SentenceTransformers` support to `pipeline` for `feature-extration` by philschmid in https://github.com/huggingface/optimum-neuron/pull/583
* Allow download subfolder for caching models with subfolder by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/566
* Do not split decoder checkpoint files by dacorvo in https://github.com/huggingface/optimum-neuron/pull/567

TGI

* Set up TGI environment values with the ones used to build the model by oOraph in https://github.com/huggingface/optimum-neuron/pull/529
* TGI benchmark with llmperf by dacorvo in https://github.com/huggingface/optimum-neuron/pull/564
* Improve tgi env wrapper for neuron by oOraph in https://github.com/huggingface/optimum-neuron/pull/589

Caveat

Currently traced models with `inline_weights_to_neff=False` have higher than expected latency during the inference. This is due to the weights are not automatically moved to Neuron devices. The issue will be fixed in 584, please avoid setting `inline_weights_to_neff=False` in this release.

Other changes
* Improve installation guide by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/559
* upgrade optimum and then install optimum-neuron by shub-kris in https://github.com/huggingface/optimum-neuron/pull/533
* Cleanup obsolete code by michaelbenayoun in https://github.com/huggingface/optimum-neuron/pull/555
* Extend TGI integration tests by dacorvo in https://github.com/huggingface/optimum-neuron/pull/561
* Modify benchmarks by dacorvo in https://github.com/huggingface/optimum-neuron/pull/563
* Bump PyTorch to 2.1 by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/502
* fix(decoder): specify libraryname to suppress warning by dacorvo in https://github.com/huggingface/optimum-neuron/pull/570
* missing \ in quickstart inference guide by yahavb in https://github.com/huggingface/optimum-neuron/pull/574
* Use AWS 2.18.0 AMI as base by dacorvo in https://github.com/huggingface/optimum-neuron/pull/572
* Update TGI router version to 2.0.1 by dacorvo in https://github.com/huggingface/optimum-neuron/pull/577
* Add guide for LoRA adapters by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/582
* eos_token_id can be a list in configs by dacorvo in https://github.com/huggingface/optimum-neuron/pull/580
* Ease the tests when there is no hf token by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/585
* Change inline weights to Neff default value to True by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/590

New Contributors
* yahavb made their first contribution in https://github.com/huggingface/optimum-neuron/pull/574

**Full Changelog**: https://github.com/huggingface/optimum-neuron/compare/v0.0.21...v0.0.22

0.0.21

What's Changed

Training

* Add GQA optimization for Tensor Parallel training to support the case `tp_size > num_key_value_heads` by michaelbenayoun in https://github.com/huggingface/optimum-neuron/pull/498
* Mixed-precision training with both `torch_xla` or `torch.autocast` by michaelbenayoun in https://github.com/huggingface/optimum-neuron/pull/523

Inference

* Add caching support for traced TorchScript models (eg. encoders, stable diffusion models) by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/510
* Support phi model on feature-extraction, text-classification, token-classification tasks by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/509

TGI

* TGI improvements by dacorvo in https://github.com/huggingface/optimum-neuron/pull/522

Caveat

AWS Neuron SDK 2.18 doesn't support the compilation of SDXL's unet with weights / neff separation, `inline_weights_to_neff=True` is forced through:
* Disable weights / neff separation of SDXL's UNET for neuron sdk 2.18 by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/554

Other changes

* Fix/ami authorized keys by shub-kris in https://github.com/huggingface/optimum-neuron/pull/517
* Skip weight load during parallel compile by michaelbenayoun in https://github.com/huggingface/optimum-neuron/pull/524
* fixing format in getting-started.ipynb by jimburtoft in https://github.com/huggingface/optimum-neuron/pull/526
* Removing colab links in notebooks.mdx by jimburtoft in https://github.com/huggingface/optimum-neuron/pull/525
* ADD stale bot by philschmid in https://github.com/huggingface/optimum-neuron/pull/530
* Bump optimum version by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/534
* Fix style by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/538
* Fix GQA permutation computation and sequential weight initialization / loading when doing TP by michaelbenayoun in https://github.com/huggingface/optimum-neuron/pull/531
* Add setup runtime step for K8S by glegendre01 in https://github.com/huggingface/optimum-neuron/pull/541
* Disable logging during precompilation by michaelbenayoun in https://github.com/huggingface/optimum-neuron/pull/539
* Do not use deprecated list_files_info by Wauplin in https://github.com/huggingface/optimum-neuron/pull/536
* Adding link to existing Fine-tuning example in Notebooks by jimburtoft in https://github.com/huggingface/optimum-neuron/pull/527
* Add missing notebooks to doc by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/543
* fix: bug in get_available_cores within container by oOraph in https://github.com/huggingface/optimum-neuron/pull/546
* Init on the `xla` device by michaelbenayoun in https://github.com/huggingface/optimum-neuron/pull/521
* Adding CodeLlama-7B inference and compilation example notebook by jimburtoft in https://github.com/huggingface/optimum-neuron/pull/549
* Add tools for auto filling traced models cache by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/537
* Remove print that should not be there by michaelbenayoun in https://github.com/huggingface/optimum-neuron/pull/552
* Use AWS Neuron sdk 2.18 by dacorvo in https://github.com/huggingface/optimum-neuron/pull/547
* Cache utils related cleanup by michaelbenayoun in https://github.com/huggingface/optimum-neuron/pull/553

New Contributors
* glegendre01 made their first contribution in https://github.com/huggingface/optimum-neuron/pull/541
* Wauplin made their first contribution in https://github.com/huggingface/optimum-neuron/pull/536

**Full Changelog**: https://github.com/huggingface/optimum-neuron/compare/v0.0.20...v0.0.21

0.0.20

What's Changed

Training

- Multi-node training support by michaelbenayoun (440)

TGI

- optimize continuous batching and improve export (506)

Inference

- Add Lora support to stable diffusion by JingyaHuang (483)
- Support sentence transformers clip by JingyaHuang (495)
- Inference compile cache script by philschmid and dacorvo (496, 504)

Doc

- Update Inference supported models list by JingyaHuang (501)

Bug fixes

- inference cache: omit irrelevant config parameters in lookup dy dacorvo (494)
- Optimize disk usage when fetching model checkpoints by dacorvo (505)

**Full Changelog**: https://github.com/huggingface/optimum-neuron/compare/v0.0.19...v0.0.20

0.0.19

What's Changed

Training

* Integrate new cache system for training by michaelbenayoun in https://github.com/huggingface/optimum-neuron/pull/472

TGI

* Support higher batch sizes using transformers-neuronx continuous batching by dacorvo in https://github.com/huggingface/optimum-neuron/pull/488
* Lift max-concurrent-request limitation usingTGI 1.4.1 by dacorvo in https://github.com/huggingface/optimum-neuron/pull/488

AMI

* Add packer support for building AWS AMI by shub-kris in https://github.com/huggingface/optimum-neuron/pull/441
* [AMI] Updates base ami to new id by philschmid in https://github.com/huggingface/optimum-neuron/pull/482

Major bugfixes

* Fix sdxl inpaint pipeline for diffusers 0.26.* by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/458
* TGI: update to controller version 1.4.0 & bug fixes by dacorvo in https://github.com/huggingface/optimum-neuron/pull/470
* Fix optimum-cli export for inf1 by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/474

Other changes
* Add TGI tests and CI workflow by dacorvo in https://github.com/huggingface/optimum-neuron/pull/355
* Bump to optimum 1.17 - Adapt to optimum exporter refactoring by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/414
* [Training] Support for Transformers 4.37 by michaelbenayoun in https://github.com/huggingface/optimum-neuron/pull/459
* Add contribution guide for Neuron exporter by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/461
* Fix path, update versions by shub-kris in https://github.com/huggingface/optimum-neuron/pull/462
* Add issue and PR templates & build optimum env cli for Neuron by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/463
* Fix trigger for actions by philschmid in https://github.com/huggingface/optimum-neuron/pull/468
* TGI: bump rust version by dacorvo in https://github.com/huggingface/optimum-neuron/pull/477
* [documentation] Add Container overview page. by philschmid in https://github.com/huggingface/optimum-neuron/pull/481
* Bump to Neuron sdk 2.17.0 by JingyaHuang in https://github.com/huggingface/optimum-neuron/pull/487

New Contributors
* shub-kris made their first contribution in https://github.com/huggingface/optimum-neuron/pull/441

**Full Changelog**: https://github.com/huggingface/optimum-neuron/compare/v0.0.18...v0.0.19

0.0.18

What's Changed

AWS SDK

* Use AWS Neuron SDK 2.16.1 (449)

Inference

* Preliminary support for neff/weights decoupling by JingyaHuang (402)
* Allow exporting decoder models using optimum-cli by dacorvo (422)
* Add Neuron X cache registry by dacorvo (442)
* Add StoppingCriteria to generate() of NeuronModelForCausalLM by dacorvo (454)

Training

* Initial support for pipeline parallelism by michaelbenayoun (279)

TGI

* TGI: support vanilla transformer models whose configuration is cached by dacorvo (445)

Tutorials and doc improvement

* Various fixes by jimburtoft michaelbenayoun JingyaHuang (428 429 432)
* Improve Stable Diffusion Notebooks by JingyaHuang (431)
* Add Sentence Transformers Guide and Notebook by philschmid (434)
* Add benchmark section by dacorvo (435)

Major bugfixes

* TGI: correctly identify special tokens during generation by dacorvo (438)
* TGI: do not include the input_text in generated text by dacorvo (454)

Other changes

* API change to be compatible to Optimum by JingyaHuang (421)

New Contributors

* jimburtoft made their first contribution in 432

**Full Changelog**: https://github.com/huggingface/optimum-neuron/compare/v0.0.17...v0.0.18

Page 1 of 4

Releases

Has known vulnerabilities

Optimum-neuron

Page 1 of 4

0.9998538494110107

0.0.22

0.0.21

0.0.20

0.0.19

0.0.18

Page 1 of 4

Links

Releases