Text-generation

Latest version: v0.7.0

Safety actively analyzes 724051 Python packages for vulnerabilities to keep your Python projects secure.

Page 3 of 6

1.0.3

What's Changed

Codellama.

* Upgrade version number in docs. by Narsil in https://github.com/huggingface/text-generation-inference/pull/910
* Added gradio example to docs by merveenoyan in https://github.com/huggingface/text-generation-inference/pull/867
* Supporting code llama. by Narsil in https://github.com/huggingface/text-generation-inference/pull/918
* Fixing the lora adaptation on docker. by Narsil in https://github.com/huggingface/text-generation-inference/pull/935
* Rebased 617 by Narsil in https://github.com/huggingface/text-generation-inference/pull/868
* New release. by Narsil in https://github.com/huggingface/text-generation-inference/pull/941

**Full Changelog**: https://github.com/huggingface/text-generation-inference/compare/v1.0.2...v1.0.3

1.0.2

What's Changed
* Have snippets in Python/JavaScript in quicktour by osanseviero in https://github.com/huggingface/text-generation-inference/pull/809
* Added two more features in readme.md file by sawanjr in https://github.com/huggingface/text-generation-inference/pull/831
* Fix rope dynamic + factor by Narsil in https://github.com/huggingface/text-generation-inference/pull/822
* fix: LlamaTokenizerFast to AutoTokenizer at flash_llama.py by dongs0104 in https://github.com/huggingface/text-generation-inference/pull/619
* README edit -- running the service with no GPU or CUDA support by pminervini in https://github.com/huggingface/text-generation-inference/pull/773
* Fix `tokenizers==0.13.4` . by Narsil in https://github.com/huggingface/text-generation-inference/pull/838
* Update README.md by adarshxs in https://github.com/huggingface/text-generation-inference/pull/848
* Fixing watermark. by Narsil in https://github.com/huggingface/text-generation-inference/pull/851
* Misc minor improvements for InferenceClient docs by osanseviero in https://github.com/huggingface/text-generation-inference/pull/852
* "Fix" for rw-1b. by Narsil in https://github.com/huggingface/text-generation-inference/pull/860
* Upgrading versions of python client. by Narsil in https://github.com/huggingface/text-generation-inference/pull/862
* Adding Idefics multi modal model. by Narsil in https://github.com/huggingface/text-generation-inference/pull/842
* Add streaming guide by osanseviero in https://github.com/huggingface/text-generation-inference/pull/858
* Adding small benchmark script. by Narsil in https://github.com/huggingface/text-generation-inference/pull/881

New Contributors
* sawanjr made their first contribution in https://github.com/huggingface/text-generation-inference/pull/831
* dongs0104 made their first contribution in https://github.com/huggingface/text-generation-inference/pull/619
* pminervini made their first contribution in https://github.com/huggingface/text-generation-inference/pull/773
* adarshxs made their first contribution in https://github.com/huggingface/text-generation-inference/pull/848

**Full Changelog**: https://github.com/huggingface/text-generation-inference/compare/v1.0.1...v1.0.2

1.0.1

Notable changes:

- More GPTQ support
- Rope scaling (linear + dynamic)
- Bitsandbytes 4bits (both modes)
- Added more documentation

What's Changed
* Local gptq support. by Narsil in https://github.com/huggingface/text-generation-inference/pull/738
* Fix typing in `Model.generate_token` by jaywonchung in https://github.com/huggingface/text-generation-inference/pull/733
* Adding Rope scaling. by Narsil in https://github.com/huggingface/text-generation-inference/pull/741
* chore: fix typo in mpt_modeling.py by eltociear in https://github.com/huggingface/text-generation-inference/pull/737
* fix(server): Failing quantize config after local read. by Narsil in https://github.com/huggingface/text-generation-inference/pull/743
* Typo fix. by Narsil in https://github.com/huggingface/text-generation-inference/pull/746
* fix typo for dynamic rotary by flozi00 in https://github.com/huggingface/text-generation-inference/pull/745
* add FastLinear import by zspo in https://github.com/huggingface/text-generation-inference/pull/750
* Automatically map deduplicated safetensors weights to their original values (501) by Narsil in https://github.com/huggingface/text-generation-inference/pull/761
* feat(server): Add native support for PEFT Lora models by Narsil in https://github.com/huggingface/text-generation-inference/pull/762
* This should prevent the PyTorch overriding. by Narsil in https://github.com/huggingface/text-generation-inference/pull/767
* fix build tokenizer in quantize and remove duplicate import by zspo in https://github.com/huggingface/text-generation-inference/pull/768
* Merge BNB 4bit. by Narsil in https://github.com/huggingface/text-generation-inference/pull/770
* Fix dynamic rope. by Narsil in https://github.com/huggingface/text-generation-inference/pull/783
* Fixing non 4bits quantization. by Narsil in https://github.com/huggingface/text-generation-inference/pull/785
* Update __init__.py by Narsil in https://github.com/huggingface/text-generation-inference/pull/794
* Llama change. by Narsil in https://github.com/huggingface/text-generation-inference/pull/793
* Setup for doc-builder and docs for TGI by merveenoyan in https://github.com/huggingface/text-generation-inference/pull/740
* Use destructuring in router arguments to avoid '.0' by ivarflakstad in https://github.com/huggingface/text-generation-inference/pull/798
* Fix gated docs by osanseviero in https://github.com/huggingface/text-generation-inference/pull/805
* Minor docs style fixes by osanseviero in https://github.com/huggingface/text-generation-inference/pull/806
* Added CLI docs and rename docker launch by merveenoyan in https://github.com/huggingface/text-generation-inference/pull/799
* [docs] Build docs only when doc files change by mishig25 in https://github.com/huggingface/text-generation-inference/pull/812
* Added ChatUI Screenshot to Docs by merveenoyan in https://github.com/huggingface/text-generation-inference/pull/823
* Upgrade transformers (fix protobuf==3.20 issue) by Narsil in https://github.com/huggingface/text-generation-inference/pull/795
* Added streaming for InferenceClient by merveenoyan in https://github.com/huggingface/text-generation-inference/pull/821
* Version 1.0.1 by Narsil in https://github.com/huggingface/text-generation-inference/pull/836

New Contributors
* jaywonchung made their first contribution in https://github.com/huggingface/text-generation-inference/pull/733
* eltociear made their first contribution in https://github.com/huggingface/text-generation-inference/pull/737
* flozi00 made their first contribution in https://github.com/huggingface/text-generation-inference/pull/745
* zspo made their first contribution in https://github.com/huggingface/text-generation-inference/pull/750
* ivarflakstad made their first contribution in https://github.com/huggingface/text-generation-inference/pull/798
* osanseviero made their first contribution in https://github.com/huggingface/text-generation-inference/pull/805
* mishig25 made their first contribution in https://github.com/huggingface/text-generation-inference/pull/812

**Full Changelog**: https://github.com/huggingface/text-generation-inference/compare/v1.0.0...v1.0.1

1.0.0

License change

We are releasing TGI v1.0 under a new license: HFOIL 1.0.
All prior versions of TGI remain licensed under Apache 2.0, the last Apache 2.0 version being [version 0.9.4](https://github.com/huggingface/text-generation-inference/releases/tag/v0.9.4).

HFOIL stands for Hugging Face Optimized Inference License, and it has been specifically designed for our optimized inference solutions. While the source code remains accessible, HFOIL is not a true open source license because we added a restriction: to sell a hosted or managed service built on top of TGI, we now require a separate agreement.
You can consult the new license [here](https://github.com/huggingface/text-generation-inference/blob/bde25e62b33b05113519e5dbf75abda06a03328e/LICENSE).

What does this mean for you?

This change in source code licensing **has no impact on the overwhelming majority of our user community** who use TGI for free. Additionally, both our Inference Endpoint customers and those of our commercial partners will also remain unaffected.

However, it will restrict non-partnered cloud service providers from offering TGI v1.0+ as a service without requesting a license.

To elaborate further:

- If you are an existing user of TGI prior to v1.0, your current version is still **Apache 2.0** and **you can use it commercially without restrictions**.

- If you are using TGI for personal use or research purposes, **the HFOIL 1.0 restrictions do not apply to you.**

- If you are using TGI for commercial purposes as part of an internal company project (that will not be sold to third parties as a hosted or managed service), **the HFOIL 1.0 restrictions do not apply to you.**

- If you integrate TGI into a hosted or managed service that you sell to customers, then consider requesting a license to upgrade to v1.0 and later versions - you can email us at api-enterprisehuggingface.co with information about your service.

For more information, see: 726.

**Full Changelog**: https://github.com/huggingface/text-generation-inference/compare/v0.9.4...v1.0.0

0.9.4

Features

* **server**: auto max_batch_total_tokens for flash att models https://github.com/huggingface/text-generation-inference/pull/630
* **router**: ngrok edge https://github.com/huggingface/text-generation-inference/pull/642
* **server**: Add trust_remote_code to quantize script by ChristophRaab https://github.com/huggingface/text-generation-inference/pull/647
* **server**: Add exllama GPTQ CUDA kernel support 553 https://github.com/huggingface/text-generation-inference/pull/666
* **server**: Directly load GPTBigCode to specified device by Atry in https://github.com/huggingface/text-generation-inference/pull/618
* **server**: add cuda memory fraction https://github.com/huggingface/text-generation-inference/pull/659
* **server**: Using `quantize_config.json` instead of GPTQ_BITS env variables https://github.com/huggingface/text-generation-inference/pull/671
* **server**: support new falcon config https://github.com/huggingface/text-generation-inference/pull/712

Fix

* **server**: llama v2 GPTQ https://github.com/huggingface/text-generation-inference/pull/648
* **server**: Fixing non parameters in quantize script `bigcode/starcoder` was an example https://github.com/huggingface/text-generation-inference/pull/661
* **server**: use mem_get_info to get kv cache size https://github.com/huggingface/text-generation-inference/pull/664
* **server**: fix exllama buffers https://github.com/huggingface/text-generation-inference/pull/689
* **server**: fix quantization python requirements https://github.com/huggingface/text-generation-inference/pull/708

New Contributors

* ChristophRaab made their first contribution in https://github.com/huggingface/text-generation-inference/pull/647
* fxmarty made their first contribution in https://github.com/huggingface/text-generation-inference/pull/648
* Atry made their first contribution in https://github.com/huggingface/text-generation-inference/pull/618

**Full Changelog**: https://github.com/huggingface/text-generation-inference/compare/v0.9.3...v0.9.4

0.9.3

Highlights

* **server**: add support for flash attention v2
* **server**: add support for llamav2

Features

* **launcher**: add debug logs
* **server**: rework the quantization to support all models

**Full Changelog**: https://github.com/huggingface/text-generation-inference/compare/v0.9.2...v0.9.3

Page 3 of 6

Releases

Has known vulnerabilities

Previous Next

Text-generation

Page 3 of 6

1.0.3

1.0.2

1.0.1

1.0.0

0.9.4

0.9.3

Page 3 of 6

Links

Releases