Highlights
* OpenAI compatible API 1427
* exllama v2 Tensor Parallel 1490
* GPTQ support for AMD GPUs 1489
* Phi support 1442
What's Changed
* fix: fix local loading for .bin models by OlivierDehaene in https://github.com/huggingface/text-generation-inference/pull/1419
* Fix missing make target platform for local install: 'install-flash-attention-v2' by deepily in https://github.com/huggingface/text-generation-inference/pull/1414
* fix: follow base model for tokenizer in router by OlivierDehaene in https://github.com/huggingface/text-generation-inference/pull/1424
* Fix local load for Medusa by PYNing in https://github.com/huggingface/text-generation-inference/pull/1420
* Return prompt vs generated tokens. by Narsil in https://github.com/huggingface/text-generation-inference/pull/1436
* feat: supports openai chat completions API by drbh in https://github.com/huggingface/text-generation-inference/pull/1427
* feat: support raise_exception, bos and eos tokens by drbh in https://github.com/huggingface/text-generation-inference/pull/1450
* chore: bump rust version and annotate/fix all clippy warnings by drbh in https://github.com/huggingface/text-generation-inference/pull/1455
* feat: conditionally toggle chat on invocations route by drbh in https://github.com/huggingface/text-generation-inference/pull/1454
* Disable `decoder_input_details` on OpenAI-compatible chat streaming, pass temp and top-k from API by EndlessReform in https://github.com/huggingface/text-generation-inference/pull/1470
* Fixing non divisible embeddings. by Narsil in https://github.com/huggingface/text-generation-inference/pull/1476
* Add messages api compatibility docs by drbh in https://github.com/huggingface/text-generation-inference/pull/1478
* Add a new `/tokenize` route to get the tokenized input by Narsil in https://github.com/huggingface/text-generation-inference/pull/1471
* feat: adds phi model by drbh in https://github.com/huggingface/text-generation-inference/pull/1442
* fix: read stderr in download by OlivierDehaene in https://github.com/huggingface/text-generation-inference/pull/1486
* fix: show warning with tokenizer config parsing error by drbh in https://github.com/huggingface/text-generation-inference/pull/1488
* fix: launcher doc typos by Narsil in https://github.com/huggingface/text-generation-inference/pull/1473
* Reinstate exl2 with tp by Narsil in https://github.com/huggingface/text-generation-inference/pull/1490
* Add sealion mpt support by Narsil in https://github.com/huggingface/text-generation-inference/pull/1477
* Trying to fix that flaky test. by Narsil in https://github.com/huggingface/text-generation-inference/pull/1491
* fix: launcher doc typos by thelinuxkid in https://github.com/huggingface/text-generation-inference/pull/1462
* Update the docs to include newer models. by Narsil in https://github.com/huggingface/text-generation-inference/pull/1492
* GPTQ support on ROCm by fxmarty in https://github.com/huggingface/text-generation-inference/pull/1489
* feat: add tokenizer-config-path to launcher args by drbh in https://github.com/huggingface/text-generation-inference/pull/1495
New Contributors
* deepily made their first contribution in https://github.com/huggingface/text-generation-inference/pull/1414
* PYNing made their first contribution in https://github.com/huggingface/text-generation-inference/pull/1420
* drbh made their first contribution in https://github.com/huggingface/text-generation-inference/pull/1427
* EndlessReform made their first contribution in https://github.com/huggingface/text-generation-inference/pull/1470
* thelinuxkid made their first contribution in https://github.com/huggingface/text-generation-inference/pull/1462
**Full Changelog**: https://github.com/huggingface/text-generation-inference/compare/v1.3.4...v1.4.0