Text-generation

Latest version: v0.7.0

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 4 of 6

0.9.2

Features

* **server**: harden a bit the weights choice to save on disk
* **server**: better errors for warmup and TP
* **server**: Support for env value for GPTQ_BITS and GPTQ_GROUPSIZE
* **server**: Implements sharding for non divisible `vocab_size`
* **launcher**: add arg validation and drop subprocess
* **router**: explicit warning if revision is not set

Fix

* **server**: Fixing RW code (it's remote code so the Arch checking doesn't work to see which weights to keep
* **server**: T5 weights names
* **server**: Adding logger import to t5_modeling.py by akowalsk
* **server**: Bug fixes for GPTQ_BITS environment variable passthrough by ssmi153
* **server**: GPTQ Env vars: catch correct type of error by ssmi153
* **server**: blacklist local files

New Contributors
* akowalsk made their first contribution in https://github.com/huggingface/text-generation-inference/pull/585
* ssmi153 made their first contribution in https://github.com/huggingface/text-generation-inference/pull/590
* gary149 made their first contribution in https://github.com/huggingface/text-generation-inference/pull/611

**Full Changelog**: https://github.com/huggingface/text-generation-inference/compare/v0.9.1...v0.9.2

0.9.1

Highlights

* **server**: Non flash MPT
* **server**: decrease memory fragmentation

Features

* **server**: use latest flash attention
* **router**: add argument for hostname in router
* **docs**: Adding some help for the options in `text-generation-benchmark`

Fix

* **makefile**: Update server/Makefile to include Makefile-vllm
* **server**: Handle loading from local files for MPT
* **server**: avoid errors for very small top_p values

**Full Changelog**: https://github.com/huggingface/text-generation-inference/compare/v0.9.0...v0.9.1

0.9.0

Highlights

* **server**: add paged attention to flash models
* **server**: Inference support for GPTQ (llama + falcon tested) + Quantization script
* **server**: only compute prefill logprobs when asked

Features

* **launcher**: parse oom signals
* **server**: batch tokenization for flash causal lm
* **server**: Rework loading by
* **server**: optimize dist ops
* **router**: add ngrok integration
* **server**: improve flash attention import errors
* **server**: Refactor conversion logic
* **router**: add header option to disable buffering for the generate_stream response by rkimball
* **router**: add arg validation

Fix

* **docs**: CUDA_VISIBLE_DEVICES comment by antferdom
* **docs**: Fix typo and use POSIX comparison in the makefile by piratos
* **server**: fix warpers on CPU
* **server**: Fixing T5 in case the names are mixed up
* **router**: add timeout on flume sends
* **server**: Do not init process group if already initialized
* **server**: Add the option to force another dtype than `f16`
* **launcher**: fix issue where launcher does not properly report shard failures

New Contributors
* antferdom made their first contribution in https://github.com/huggingface/text-generation-inference/pull/441
* piratos made their first contribution in https://github.com/huggingface/text-generation-inference/pull/443
* Yard1 made their first contribution in https://github.com/huggingface/text-generation-inference/pull/388
* rkimball made their first contribution in https://github.com/huggingface/text-generation-inference/pull/498

**Full Changelog**: https://github.com/huggingface/text-generation-inference/compare/v0.8.2...v0.9.0

0.8.2

Features

- **server**: remove trust_remote_code requirement for falcon models
- **server**: load santacoder/starcoder models with safetensors

Fix

- **server**: fix has_position_ids

**Full Changelog**: https://github.com/huggingface/text-generation-inference/compare/v0.8.1...v0.8.2

0.8.1

Features

- **server**: add retry on download

Fix

- **server**: fix bnb quantization for CausalLM models

**Full Changelog**: https://github.com/huggingface/text-generation-inference/compare/v0.8.0...v0.8.1

0.8.0

Features

- **router**: support vectorized warpers in flash causal lm (co-authored by jlamypoirier )
- **proto**: decrease IPC proto size
- **benchmarker**: add summary tables
- **server**: support RefinedWeb models

Fix

- **server**: Fix issue when load AutoModelForSeq2SeqLM model (contributed by CL-Shang)

New Contributors
* CL-Shang made their first contribution in https://github.com/huggingface/text-generation-inference/pull/370
* jlamypoirier made their first contribution in https://github.com/huggingface/text-generation-inference/pull/317

**Full Changelog**: https://github.com/huggingface/text-generation-inference/compare/v0.7.0...v0.8.0

Page 4 of 6

Releases

Has known vulnerabilities

Previous Next

Text-generation

Page 4 of 6

0.9.2

0.9.1

0.9.0

0.8.2

0.8.1

0.8.0

Page 4 of 6

Links

Releases