Optimum-tpu

Latest version: v0.2.3

Safety actively analyzes 723177 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 2

0.2.3

Holidays season release! 🎄
This Optimum TPU release comes in with a larger support for models, in particular newer Llamas 🦙 for serving and fine-tuning, as well as initial support for the all recent TPU `v6e` and few fixes here and there.

What's Changed
* fix(ci): correct TGI_VERSION definition in workflow by tengomucho in https://github.com/huggingface/optimum-tpu/pull/122
* Fix nightlies again by tengomucho in https://github.com/huggingface/optimum-tpu/pull/123
* ⚙️ Fix Integration Test for TGI by baptistecolle in https://github.com/huggingface/optimum-tpu/pull/124
* 🔂 Fix repetition penalty by tengomucho in https://github.com/huggingface/optimum-tpu/pull/125
* Allow sharding fine tuned misaligned models by tengomucho in https://github.com/huggingface/optimum-tpu/pull/126
* 🦙 Newer Llamas support by tengomucho in https://github.com/huggingface/optimum-tpu/pull/129
* 🦙 Add llama fine-tuning notebook example by baptistecolle in https://github.com/huggingface/optimum-tpu/pull/130
* doc(v6e): mention initial v6e support by tengomucho in https://github.com/huggingface/optimum-tpu/pull/131
* ⚙️ Refactor TGI Dockerfile to support Google-Cloud-Containers as a target by baptistecolle in https://github.com/huggingface/optimum-tpu/pull/127
* 🐛 Fix the convergence of loss function for the llama fine tuning example by baptistecolle in https://github.com/huggingface/optimum-tpu/pull/132
* chore: update version to v0.2.3 by tengomucho in https://github.com/huggingface/optimum-tpu/pull/133

**Full Changelog**: https://github.com/huggingface/optimum-tpu/compare/v0.2.1...v0.2.3

0.2.1

This is a release to further simplify TGI usage with Jetstream, making it the default backend, and correcting an environment variable usage. Finally, dependencies are updated to guarantee we use the latest features of the frameworks we are based on.

What's Changed
* fix(test): correct nightly tests with random sampling by tengomucho in https://github.com/huggingface/optimum-tpu/pull/117
* Jetstream by default by tengomucho in https://github.com/huggingface/optimum-tpu/pull/118
* 🧹 Cleanup of the batch size environment variables by baptistecolle in https://github.com/huggingface/optimum-tpu/pull/121
* ⬆️ Update dependencies by tengomucho in https://github.com/huggingface/optimum-tpu/pull/120

**Full Changelog**: https://github.com/huggingface/optimum-tpu/compare/v0.2.0...v0.2.1

0.2.0

This is the first release of Optimum TPU that includes support for Jetstream Pytorch engine as backend for Test Generation Inference (TGI).
[JetStream](https://github.com/AI-Hypercomputer/JetStream) is a throughput and memory optimized engine for LLM inference on TPUs, and its [Pytorch implementation](https://github.com/AI-Hypercomputer/jetstream-pytorch) allows for a seamless integration in the TGI code. The supported models (for now Llama 2 and Llama 3, Gemma 1 and Mixtral, and serving inference on these models resulted has given results close to 10x in terms of tokens/sec compared to the previously used backend (Pytorch XLA/transformers).
On top of that, it is possible to use quantization to serve using even less resources while maintaining a similar throughput and quality.
Details follow.

What's Changed
* Update colab examples by wenxindongwork in https://github.com/huggingface/optimum-tpu/pull/86
* ci(docker): update torch-xla to 2.4.0 by tengomucho in https://github.com/huggingface/optimum-tpu/pull/89
* ✈️ Introduce Jetstream/Pytorch in TGI by tengomucho in https://github.com/huggingface/optimum-tpu/pull/88
* 🦙 Llama3 on TGI - Jetstream Pytorch by tengomucho in https://github.com/huggingface/optimum-tpu/pull/90
* ☝️ Update Jetstream Pytorch revision by tengomucho in https://github.com/huggingface/optimum-tpu/pull/91
* Correct extra token, start preparing docker image for TGI/Jetstream Pt by tengomucho in https://github.com/huggingface/optimum-tpu/pull/93
* Fix generation using Jetstream Pytorch by tengomucho in https://github.com/huggingface/optimum-tpu/pull/94
* Fix slow tests by tengomucho in https://github.com/huggingface/optimum-tpu/pull/95
* 🧹 Cleanup and fixes for TGI by tengomucho in https://github.com/huggingface/optimum-tpu/pull/96
* Small TGI enhancements by tengomucho in https://github.com/huggingface/optimum-tpu/pull/97
* fix(TGI Jetstream Pt): prefill should be done with max input size by tengomucho in https://github.com/huggingface/optimum-tpu/pull/98
* 💎 Gemma on TGI Jetstream Pytorch by tengomucho in https://github.com/huggingface/optimum-tpu/pull/99
* Fix ci nightly jetstream by tengomucho in https://github.com/huggingface/optimum-tpu/pull/101
* CI ephemeral TPUs by tengomucho in https://github.com/huggingface/optimum-tpu/pull/102
* 🍃 Added Mixtral on TGI / Jetstream Pytorch by tengomucho in https://github.com/huggingface/optimum-tpu/pull/103
* Add CLI to install dependencies by tengomucho in https://github.com/huggingface/optimum-tpu/pull/104
* ⛰ CI: mount hub cache and fix issues with cli by tengomucho in https://github.com/huggingface/optimum-tpu/pull/106
* fix(docker): correct jetstream installation in TGI docker image by tengomucho in https://github.com/huggingface/optimum-tpu/pull/107
* ✏️ docs: Add training guide and improve documentation consistency by baptistecolle in https://github.com/huggingface/optimum-tpu/pull/110
* Quantization Jetstream Pytorch by tengomucho in https://github.com/huggingface/optimum-tpu/pull/111
* fix: graceful shutdown was not working with entrypoint, exec launcher by co42 in https://github.com/huggingface/optimum-tpu/pull/112
* fix(doc): correct link to deploy page by tengomucho in https://github.com/huggingface/optimum-tpu/pull/115
* More Jetstream Pytorch fixes, prepare for release by tengomucho in https://github.com/huggingface/optimum-tpu/pull/116

New Contributors
* wenxindongwork made their first contribution in https://github.com/huggingface/optimum-tpu/pull/86
* baptistecolle made their first contribution in https://github.com/huggingface/optimum-tpu/pull/110
* co42 made their first contribution in https://github.com/huggingface/optimum-tpu/pull/112

**Full Changelog**: https://github.com/huggingface/optimum-tpu/compare/v0.1.5...v0.2.0

0.1.5

This release is essentially the same as the previous one (v0.1.4), but it allows correct PyPI package publication.

0.1.4

These changes focus on improving support for instruct models and solve an issue appearing when using those models through the web ui interface with invalid settings.

What's Changed
* Fix secret leak workflow by tengomucho in https://github.com/huggingface/optimum-tpu/pull/72
* Handle selector exception by tengomucho in https://github.com/huggingface/optimum-tpu/pull/73
* chore(tgi): update TGI base image by tengomucho in https://github.com/huggingface/optimum-tpu/pull/75
* Fix instruct models UI issue by tengomucho in https://github.com/huggingface/optimum-tpu/pull/78

**Full Changelog**: https://github.com/huggingface/optimum-tpu/compare/v0.1.3...v0.1.4

0.1.3

Cleanup of previous fixed and lower batch size to prevent memory issues on Inference Endpoints with some models.

What's Changed
* Few more Inference Endpoints fixes by tengomucho in https://github.com/huggingface/optimum-tpu/pull/69
* feat(cache): use optimized StaticCache class for XLA by tengomucho in https://github.com/huggingface/optimum-tpu/pull/70
* Lower TGI IE batch size by tengomucho in https://github.com/huggingface/optimum-tpu/pull/71

**Full Changelog**: https://github.com/huggingface/optimum-tpu/compare/v0.1.2...v0.1.3

Page 1 of 2

Releases

Has known vulnerabilities

Optimum-tpu

Page 1 of 2

0.2.3

0.2.1

0.2.0

0.1.5

0.1.4

0.1.3

Page 1 of 2

Links

Releases