Optimum-tpu

Latest version: v0.2.0

Safety actively analyzes 682361 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

0.2.0

This is the first release of Optimum TPU that includes support for Jetstream Pytorch engine as backend for Test Generation Inference (TGI).
[JetStream](https://github.com/AI-Hypercomputer/JetStream) is a throughput and memory optimized engine for LLM inference on TPUs, and its [Pytorch implementation](https://github.com/AI-Hypercomputer/jetstream-pytorch) allows for a seamless integration in the TGI code. The supported models (for now Llama 2 and Llama 3, Gemma 1 and Mixtral, and serving inference on these models resulted has given results close to 10x in terms of tokens/sec compared to the previously used backend (Pytorch XLA/transformers).
On top of that, it is possible to use quantization to serve using even less resources while maintaining a similar throughput and quality.
Details follow.

What's Changed
* Update colab examples by wenxindongwork in https://github.com/huggingface/optimum-tpu/pull/86
* ci(docker): update torch-xla to 2.4.0 by tengomucho in https://github.com/huggingface/optimum-tpu/pull/89
* โœˆ๏ธ Introduce Jetstream/Pytorch in TGI by tengomucho in https://github.com/huggingface/optimum-tpu/pull/88
* ๐Ÿฆ™ Llama3 on TGI - Jetstream Pytorch by tengomucho in https://github.com/huggingface/optimum-tpu/pull/90
* โ˜๏ธ Update Jetstream Pytorch revision by tengomucho in https://github.com/huggingface/optimum-tpu/pull/91
* Correct extra token, start preparing docker image for TGI/Jetstream Pt by tengomucho in https://github.com/huggingface/optimum-tpu/pull/93
* Fix generation using Jetstream Pytorch by tengomucho in https://github.com/huggingface/optimum-tpu/pull/94
* Fix slow tests by tengomucho in https://github.com/huggingface/optimum-tpu/pull/95
* ๐Ÿงน Cleanup and fixes for TGI by tengomucho in https://github.com/huggingface/optimum-tpu/pull/96
* Small TGI enhancements by tengomucho in https://github.com/huggingface/optimum-tpu/pull/97
* fix(TGI Jetstream Pt): prefill should be done with max input size by tengomucho in https://github.com/huggingface/optimum-tpu/pull/98
* ๐Ÿ’Ž Gemma on TGI Jetstream Pytorch by tengomucho in https://github.com/huggingface/optimum-tpu/pull/99
* Fix ci nightly jetstream by tengomucho in https://github.com/huggingface/optimum-tpu/pull/101
* CI ephemeral TPUs by tengomucho in https://github.com/huggingface/optimum-tpu/pull/102
* ๐Ÿƒ Added Mixtral on TGI / Jetstream Pytorch by tengomucho in https://github.com/huggingface/optimum-tpu/pull/103
* Add CLI to install dependencies by tengomucho in https://github.com/huggingface/optimum-tpu/pull/104
* โ›ฐ CI: mount hub cache and fix issues with cli by tengomucho in https://github.com/huggingface/optimum-tpu/pull/106
* fix(docker): correct jetstream installation in TGI docker image by tengomucho in https://github.com/huggingface/optimum-tpu/pull/107
* โœ๏ธ docs: Add training guide and improve documentation consistency by baptistecolle in https://github.com/huggingface/optimum-tpu/pull/110
* Quantization Jetstream Pytorch by tengomucho in https://github.com/huggingface/optimum-tpu/pull/111
* fix: graceful shutdown was not working with entrypoint, exec launcher by co42 in https://github.com/huggingface/optimum-tpu/pull/112
* fix(doc): correct link to deploy page by tengomucho in https://github.com/huggingface/optimum-tpu/pull/115
* More Jetstream Pytorch fixes, prepare for release by tengomucho in https://github.com/huggingface/optimum-tpu/pull/116

New Contributors
* wenxindongwork made their first contribution in https://github.com/huggingface/optimum-tpu/pull/86
* baptistecolle made their first contribution in https://github.com/huggingface/optimum-tpu/pull/110
* co42 made their first contribution in https://github.com/huggingface/optimum-tpu/pull/112

**Full Changelog**: https://github.com/huggingface/optimum-tpu/compare/v0.1.5...v0.2.0

0.1.5

This release is essentially the same as the previous one (v0.1.4), but it allows correct PyPI package publication.

0.1.4

These changes focus on improving support for instruct models and solve an issue appearing when using those models through the web ui interface with invalid settings.

What's Changed
* Fix secret leak workflow by tengomucho in https://github.com/huggingface/optimum-tpu/pull/72
* Handle selector exception by tengomucho in https://github.com/huggingface/optimum-tpu/pull/73
* chore(tgi): update TGI base image by tengomucho in https://github.com/huggingface/optimum-tpu/pull/75
* Fix instruct models UI issue by tengomucho in https://github.com/huggingface/optimum-tpu/pull/78


**Full Changelog**: https://github.com/huggingface/optimum-tpu/compare/v0.1.3...v0.1.4

0.1.3

Cleanup of previous fixed and lower batch size to prevent memory issues on Inference Endpoints with some models.

What's Changed
* Few more Inference Endpoints fixes by tengomucho in https://github.com/huggingface/optimum-tpu/pull/69
* feat(cache): use optimized StaticCache class for XLA by tengomucho in https://github.com/huggingface/optimum-tpu/pull/70
* Lower TGI IE batch size by tengomucho in https://github.com/huggingface/optimum-tpu/pull/71


**Full Changelog**: https://github.com/huggingface/optimum-tpu/compare/v0.1.2...v0.1.3

0.1.2

What's Changed

This Release contains only few small fixes, mainly for Inference Endpoints.

* Several Inference Endpoint fixes by tengomucho in https://github.com/huggingface/optimum-tpu/pull/66
* More Inference Endpoints features and fixes by tengomucho in https://github.com/huggingface/optimum-tpu/pull/68


**Full Changelog**: https://github.com/huggingface/optimum-tpu/compare/v0.1.1...v0.1.2

0.1.1

TPU first release, allowing to have TPU Text Generation Inference and Inference Endpoints container images available.

What's Changed
* Basic TGI server on XLA by tengomucho in https://github.com/huggingface/optimum-tpu/pull/1
* Enable CI/CD by tengomucho in https://github.com/huggingface/optimum-tpu/pull/2
* Fix TGI Dockerfile by shub-kris in https://github.com/huggingface/optimum-tpu/pull/3
* Add static KV cache and test on Gemma-2B by tengomucho in https://github.com/huggingface/optimum-tpu/pull/4
* Small optimizations by tengomucho in https://github.com/huggingface/optimum-tpu/pull/5
* Enable compilation by tengomucho in https://github.com/huggingface/optimum-tpu/pull/6
* Revert "fix: attention mask should be 1 or 0" by tengomucho in https://github.com/huggingface/optimum-tpu/pull/8
* feat: use dynamic batching when generating by tengomucho in https://github.com/huggingface/optimum-tpu/pull/9
* Repo layout by tengomucho in https://github.com/huggingface/optimum-tpu/pull/10
* Add PyPI release workflow by regisss in https://github.com/huggingface/optimum-tpu/pull/11
* Xla parallel proxy by tengomucho in https://github.com/huggingface/optimum-tpu/pull/12
* Add documentation to the repository by mfuntowicz in https://github.com/huggingface/optimum-tpu/pull/13
* Adopt naming convention of transformers API by mfuntowicz in https://github.com/huggingface/optimum-tpu/pull/14
* Fix main doc build workflow by regisss in https://github.com/huggingface/optimum-tpu/pull/15
* Improve readme by mfuntowicz in https://github.com/huggingface/optimum-tpu/pull/16
* Fix layout in README by mfuntowicz in https://github.com/huggingface/optimum-tpu/pull/17
* Fix rule and instructions for TGI by mfuntowicz in https://github.com/huggingface/optimum-tpu/pull/18
* Fix typo in index.mdx by mfuntowicz in https://github.com/huggingface/optimum-tpu/pull/19
* Added some links to Cloud TPU documentation by mikegre-google in https://github.com/huggingface/optimum-tpu/pull/20
* Parallel sharding by tengomucho in https://github.com/huggingface/optimum-tpu/pull/21
* Bump version to 0.1.0.dev1 by mfuntowicz in https://github.com/huggingface/optimum-tpu/pull/24
* Bump version to 0.1.0.dev2 by mfuntowicz in https://github.com/huggingface/optimum-tpu/pull/25
* Fix TGI missing import by mfuntowicz in https://github.com/huggingface/optimum-tpu/pull/27
* Forward arguments from TGI launcher to the model by mfuntowicz in https://github.com/huggingface/optimum-tpu/pull/28
* Fix optimum-tpu pip install instructions by mfuntowicz in https://github.com/huggingface/optimum-tpu/pull/29
* Fix tests with do_sample=True by tengomucho in https://github.com/huggingface/optimum-tpu/pull/30
* Sharding in tgi by tengomucho in https://github.com/huggingface/optimum-tpu/pull/31
* Fix missing '=' to assign environment variables in the default case wโ€ฆ by mfuntowicz in https://github.com/huggingface/optimum-tpu/pull/33
* Include two different stages for building TGI image: by mfuntowicz in https://github.com/huggingface/optimum-tpu/pull/34
* Llama support by tengomucho in https://github.com/huggingface/optimum-tpu/pull/32
* chore(ci): added workflow for nightly tests by tengomucho in https://github.com/huggingface/optimum-tpu/pull/35
* fix(build): setup.py removed from build_dist dependencies by tengomucho in https://github.com/huggingface/optimum-tpu/pull/36
* Try again to fix nightly builds by tengomucho in https://github.com/huggingface/optimum-tpu/pull/37
* Basic Llama2 Tuning by tengomucho in https://github.com/huggingface/optimum-tpu/pull/39
* Bug doc builder by pagezyhf in https://github.com/huggingface/optimum-tpu/pull/40
* Fix typo ; Update llama_tuning.md by furkanakkurt1335 in https://github.com/huggingface/optimum-tpu/pull/42
* Update to Pytorch 2.3.0 and transformers v4.40.2 by tengomucho in https://github.com/huggingface/optimum-tpu/pull/41
* Fine tuning with FSDP v2 by tengomucho in https://github.com/huggingface/optimum-tpu/pull/44
* Minor fix for mispelled stage in TGI dockerfile. by thealmightygrant in https://github.com/huggingface/optimum-tpu/pull/46
* Align to Transformers 4.41.1 by tengomucho in https://github.com/huggingface/optimum-tpu/pull/45
* chore(training): Allow training on torch xla > 2.3.0, add warning by tengomucho in https://github.com/huggingface/optimum-tpu/pull/48
* fix(build): add missing setuptools_scm section by tengomucho in https://github.com/huggingface/optimum-tpu/pull/49
* fix(logging): correct logging usage by tengomucho in https://github.com/huggingface/optimum-tpu/pull/50
* fix(tests): fix decode sample expected outputs again by tengomucho in https://github.com/huggingface/optimum-tpu/pull/52
* fix(doc): update server and port when serving TGI by tengomucho in https://github.com/huggingface/optimum-tpu/pull/53
* fix(ci): correct secrets leak workflow check by tengomucho in https://github.com/huggingface/optimum-tpu/pull/55
* Add Mistral support ๐Ÿ’จ by tengomucho in https://github.com/huggingface/optimum-tpu/pull/54
* Mistral nits by tengomucho in https://github.com/huggingface/optimum-tpu/pull/57
* chore: bump to version v0.1.0a1 by tengomucho in https://github.com/huggingface/optimum-tpu/pull/60
* feat(TGI): add release docker image build and push to registry workflow by tengomucho in https://github.com/huggingface/optimum-tpu/pull/62
* chore: bump to version v0.1.1 by tengomucho in https://github.com/huggingface/optimum-tpu/pull/63

New Contributors
* tengomucho made their first contribution in https://github.com/huggingface/optimum-tpu/pull/1
* shub-kris made their first contribution in https://github.com/huggingface/optimum-tpu/pull/3
* regisss made their first contribution in https://github.com/huggingface/optimum-tpu/pull/11
* mfuntowicz made their first contribution in https://github.com/huggingface/optimum-tpu/pull/13
* mikegre-google made their first contribution in https://github.com/huggingface/optimum-tpu/pull/20
* pagezyhf made their first contribution in https://github.com/huggingface/optimum-tpu/pull/40
* furkanakkurt1335 made their first contribution in https://github.com/huggingface/optimum-tpu/pull/42
* thealmightygrant made their first contribution in https://github.com/huggingface/optimum-tpu/pull/46

**Full Changelog**: https://github.com/huggingface/optimum-tpu/commits/v0.1.1

Links

Releases

ยฉ 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.