<!-- Release notes generated using configuration in .github/release.yml at main -->
Highlight
Support vision-languange models (VLM) inference pipeline and serving.
Currently, it supports the following models, [Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat), LLaVA series [v1.5](https://huggingface.co/collections/liuhaotian/llava-15-653aac15d994e992e2677a7e), [v1.6](https://huggingface.co/collections/liuhaotian/llava-16-65b9e40155f60fd046a5ccf2) and [Yi-VL](https://huggingface.co/01-ai/Yi-VL-6B)
- VLM Inference Pipeline
python
from lmdeploy import pipeline
from lmdeploy.vl import load_image
pipe = pipeline('liuhaotian/llava-v1.6-vicuna-7b')
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
response = pipe(('describe this image', image))
print(response)
Please refer to the detailed guide from [here](https://lmdeploy.readthedocs.io/en/latest/inference/vl_pipeline.html)
- VLM serving by openai compatible server
shell
lmdeploy server api_server liuhaotian/llava-v1.6-vicuna-7b --server-port 8000
- VLM Serving by gradio
shell
lmdeploy serve gradio liuhaotian/llava-v1.6-vicuna-7b --server-port 6006
What's Changed
๐ Features
* Add inference pipeline for VL models by irexyc in https://github.com/InternLM/lmdeploy/pull/1214
* Support serving VLMs by AllentDan in https://github.com/InternLM/lmdeploy/pull/1285
* Serve VLM by gradio by irexyc in https://github.com/InternLM/lmdeploy/pull/1293
* Add pipeline.chat api for easy use by irexyc in https://github.com/InternLM/lmdeploy/pull/1292
๐ฅ Improvements
* Hide qos functions from swagger UI if not applied by AllentDan in https://github.com/InternLM/lmdeploy/pull/1238
* Color log formatter by grimoire in https://github.com/InternLM/lmdeploy/pull/1247
* optimize filling kv cache kernel in pytorch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/1251
* Refactor chat template and support accurate name matching. by AllentDan in https://github.com/InternLM/lmdeploy/pull/1216
* Support passing json file to chat template by AllentDan in https://github.com/InternLM/lmdeploy/pull/1200
* upgrade peft and check adapters by grimoire in https://github.com/InternLM/lmdeploy/pull/1284
* better cache allocation in pytorch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/1272
* Fall back to base template if there is no chat_template in tokenizer_config.json by AllentDan in https://github.com/InternLM/lmdeploy/pull/1294
๐ Bug fixes
* lazy load convert_pv jit function by grimoire in https://github.com/InternLM/lmdeploy/pull/1253
* [BUG] fix the case when num_used_blocks < 0 by jjjjohnson in https://github.com/InternLM/lmdeploy/pull/1277
* Check bf16 model in torch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/1270
* fix bf16 check by grimoire in https://github.com/InternLM/lmdeploy/pull/1281
* [Fix] fix triton server chatbot init error by AllentDan in https://github.com/InternLM/lmdeploy/pull/1278
* Fix concatenate issue in profile serving by ispobock in https://github.com/InternLM/lmdeploy/pull/1282
* fix torch tp lora adapter by grimoire in https://github.com/InternLM/lmdeploy/pull/1300
* Fix crash when api_server loads a turbomind model by irexyc in https://github.com/InternLM/lmdeploy/pull/1304
๐ Documentations
* fix config for readthedocs by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1245
* update badges in README by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1243
* Update serving guide including api_server and gradio by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1248
* rename restful_api.md to api_server.md by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1287
* Update readthedocs index by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1288
๐ Other
* Parallelize testcase and refactor test workflow by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1254
* Accelerate sample request in benchmark script by ispobock in https://github.com/InternLM/lmdeploy/pull/1264
* Update eval ci cfg by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1259
* Test case bugfix and add restful interface testcases. by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1271
* bump version to v0.2.6 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1299
New Contributors
* jjjjohnson made their first contribution in https://github.com/InternLM/lmdeploy/pull/1277
**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.2.5...v0.2.6