Lmdeploy

Latest version: v0.7.2.post1

Safety actively analyzes 723625 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 8

0.2.6

<!-- Release notes generated using configuration in .github/release.yml at main -->
Highlight

Support vision-languange models (VLM) inference pipeline and serving.
Currently, it supports the following models, [Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat), LLaVA series [v1.5](https://huggingface.co/collections/liuhaotian/llava-15-653aac15d994e992e2677a7e), [v1.6](https://huggingface.co/collections/liuhaotian/llava-16-65b9e40155f60fd046a5ccf2) and [Yi-VL](https://huggingface.co/01-ai/Yi-VL-6B)

- VLM Inference Pipeline
python
from lmdeploy import pipeline
from lmdeploy.vl import load_image

pipe = pipeline('liuhaotian/llava-v1.6-vicuna-7b')

image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
response = pipe(('describe this image', image))
print(response)

Please refer to the detailed guide from [here](https://lmdeploy.readthedocs.io/en/latest/inference/vl_pipeline.html)

- VLM serving by openai compatible server

shell
lmdeploy server api_server liuhaotian/llava-v1.6-vicuna-7b --server-port 8000


- VLM Serving by gradio

shell
lmdeploy serve gradio liuhaotian/llava-v1.6-vicuna-7b --server-port 6006


What's Changed
๐Ÿš€ Features
* Add inference pipeline for VL models by irexyc in https://github.com/InternLM/lmdeploy/pull/1214
* Support serving VLMs by AllentDan in https://github.com/InternLM/lmdeploy/pull/1285
* Serve VLM by gradio by irexyc in https://github.com/InternLM/lmdeploy/pull/1293
* Add pipeline.chat api for easy use by irexyc in https://github.com/InternLM/lmdeploy/pull/1292
๐Ÿ’ฅ Improvements
* Hide qos functions from swagger UI if not applied by AllentDan in https://github.com/InternLM/lmdeploy/pull/1238
* Color log formatter by grimoire in https://github.com/InternLM/lmdeploy/pull/1247
* optimize filling kv cache kernel in pytorch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/1251
* Refactor chat template and support accurate name matching. by AllentDan in https://github.com/InternLM/lmdeploy/pull/1216
* Support passing json file to chat template by AllentDan in https://github.com/InternLM/lmdeploy/pull/1200
* upgrade peft and check adapters by grimoire in https://github.com/InternLM/lmdeploy/pull/1284
* better cache allocation in pytorch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/1272
* Fall back to base template if there is no chat_template in tokenizer_config.json by AllentDan in https://github.com/InternLM/lmdeploy/pull/1294
๐Ÿž Bug fixes
* lazy load convert_pv jit function by grimoire in https://github.com/InternLM/lmdeploy/pull/1253
* [BUG] fix the case when num_used_blocks < 0 by jjjjohnson in https://github.com/InternLM/lmdeploy/pull/1277
* Check bf16 model in torch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/1270
* fix bf16 check by grimoire in https://github.com/InternLM/lmdeploy/pull/1281
* [Fix] fix triton server chatbot init error by AllentDan in https://github.com/InternLM/lmdeploy/pull/1278
* Fix concatenate issue in profile serving by ispobock in https://github.com/InternLM/lmdeploy/pull/1282
* fix torch tp lora adapter by grimoire in https://github.com/InternLM/lmdeploy/pull/1300
* Fix crash when api_server loads a turbomind model by irexyc in https://github.com/InternLM/lmdeploy/pull/1304
๐Ÿ“š Documentations
* fix config for readthedocs by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1245
* update badges in README by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1243
* Update serving guide including api_server and gradio by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1248
* rename restful_api.md to api_server.md by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1287
* Update readthedocs index by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1288
๐ŸŒ Other
* Parallelize testcase and refactor test workflow by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1254
* Accelerate sample request in benchmark script by ispobock in https://github.com/InternLM/lmdeploy/pull/1264
* Update eval ci cfg by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1259
* Test case bugfix and add restful interface testcases. by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1271
* bump version to v0.2.6 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1299

New Contributors
* jjjjohnson made their first contribution in https://github.com/InternLM/lmdeploy/pull/1277

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.2.5...v0.2.6

0.2.5

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
๐Ÿš€ Features
* Support mistral and sliding window attention by grimoire in https://github.com/InternLM/lmdeploy/pull/1075
* torch engine support chatglm3 by grimoire in https://github.com/InternLM/lmdeploy/pull/1159
* Support qwen1.5 in pytorch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/1160
* Support mixtral for pytorch engine by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1133
* Support torch deepseek moe by grimoire in https://github.com/InternLM/lmdeploy/pull/1163
* Support gemma model in pytorch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/1184
* Auto backend for pipeline and serve when backend is not set to pytorch explicitly by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1211
๐Ÿ’ฅ Improvements
* Fix argument error by ispobock in https://github.com/InternLM/lmdeploy/pull/1193
* Use LifoQueue for turbomind async_stream_infer by AllentDan in https://github.com/InternLM/lmdeploy/pull/1179
* Update interactive output len strategy and response by AllentDan in https://github.com/InternLM/lmdeploy/pull/1164
* Support `min_new_tokens` generation config in pytorch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/1096
* Batched sampling by grimoire in https://github.com/InternLM/lmdeploy/pull/1197
* refactor the logic of getting `model_name` by AllentDan in https://github.com/InternLM/lmdeploy/pull/1188
* Add parameter `max_prefill_token_num` by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1203
* optmize baichuan in pytorch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/1223
* check model required transformers version by grimoire in https://github.com/InternLM/lmdeploy/pull/1220
* torch optmize chatglm3 by grimoire in https://github.com/InternLM/lmdeploy/pull/1215
* Async torch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/1206
* remove unused kernel in pytorch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/1237
๐Ÿž Bug fixes
* Fix session length for profile generation by ispobock in https://github.com/InternLM/lmdeploy/pull/1181
* fix torch engine infer by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1185
* fix module map by grimoire in https://github.com/InternLM/lmdeploy/pull/1205
* [Fix] Correct session length warning by AllentDan in https://github.com/InternLM/lmdeploy/pull/1207
* Fix all devices occupation when applying tp to torch engine by updating device map by grimoire in https://github.com/InternLM/lmdeploy/pull/1172
* Fix falcon chatglm2 template by grimoire in https://github.com/InternLM/lmdeploy/pull/1168
* [Fix] Avoid AsyncEngine running the same session id by AllentDan in https://github.com/InternLM/lmdeploy/pull/1219
* Fix `None` session_len by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1230
* fix multinomial sampling by grimoire in https://github.com/InternLM/lmdeploy/pull/1228
* fix returning logits in prefill phase of pytorch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/1209
* optimize pytorch engine inference with falcon model by grimoire in https://github.com/InternLM/lmdeploy/pull/1234
* fix bf16 multinomial sampling by grimoire in https://github.com/InternLM/lmdeploy/pull/1239
* reduce torchengine prefill mem usage by grimoire in https://github.com/InternLM/lmdeploy/pull/1240
๐Ÿ“š Documentations
* auto generate pipeline api for readthedocs by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1186
* Added tutorial document for deploying lmdeploy on Jetson series boards. by BestAnHongjun in https://github.com/InternLM/lmdeploy/pull/1192
* update doc index by zhyncs in https://github.com/InternLM/lmdeploy/pull/1241
๐ŸŒ Other
* Add PR test workflow and check-in more testcases by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1208
* fix pytest version by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1236
* bump version to v0.2.5 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1235

New Contributors
* ispobock made their first contribution in https://github.com/InternLM/lmdeploy/pull/1181
* BestAnHongjun made their first contribution in https://github.com/InternLM/lmdeploy/pull/1192

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.2.4...v0.2.5

0.2.4

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
๐Ÿ’ฅ Improvements
* use stricter rules to get weight file by irexyc in https://github.com/InternLM/lmdeploy/pull/1070
* check pytorch engine environment by grimoire in https://github.com/InternLM/lmdeploy/pull/1107
* Update Dockerfile order to launch the http service by `docker run` directly by AllentDan in https://github.com/InternLM/lmdeploy/pull/1162
* Support torch cache_max_entry_count by grimoire in https://github.com/InternLM/lmdeploy/pull/1166
* Remove the manual model conversion during benchmark by lvhan028 in https://github.com/InternLM/lmdeploy/pull/953
* update llama triton example by zhyncs in https://github.com/InternLM/lmdeploy/pull/1153
๐Ÿž Bug fixes
* fix embedding copy size by irexyc in https://github.com/InternLM/lmdeploy/pull/1036
* fix pytorch engine with peft==0.8.2 by grimoire in https://github.com/InternLM/lmdeploy/pull/1122
* support triton2.2 by grimoire in https://github.com/InternLM/lmdeploy/pull/1137
* Add `top_k` in ChatCompletionRequest by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1174
* minor fix benchmark generation guide and script by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1175
๐Ÿ“š Documentations
* docs add debug turbomind guide by zhyncs in https://github.com/InternLM/lmdeploy/pull/1121
๐ŸŒ Other
* Add eval ci by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1060
* Ete testcase add more models by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1077
* Fix win ci by irexyc in https://github.com/InternLM/lmdeploy/pull/1132
* bump version to v0.2.4 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1171


**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.2.3...v0.2.4

0.2.3

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
๐Ÿš€ Features
* Support loading model from modelscope by irexyc in https://github.com/InternLM/lmdeploy/pull/1069
๐Ÿ’ฅ Improvements
* Remove caching tokenizer.json by grimoire in https://github.com/InternLM/lmdeploy/pull/1074
* Refactor `get_logger` to remove the dependency of MMLogger from mmengine by yinfan98 in https://github.com/InternLM/lmdeploy/pull/1064
* Use TM_LOG_LEVEL environment variable first by zhyncs in https://github.com/InternLM/lmdeploy/pull/1071
* Speed up the initialization of w8a8 model for torch engine by yinfan98 in https://github.com/InternLM/lmdeploy/pull/1088
* Make logging.logger's behavior consistent with MMLogger by irexyc in https://github.com/InternLM/lmdeploy/pull/1092
* Remove owned_session for torch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/1097
* Unify engine initialization in pipeline by irexyc in https://github.com/InternLM/lmdeploy/pull/1085
* Add skip_special_tokens in GenerationConfig by grimoire in https://github.com/InternLM/lmdeploy/pull/1091
* Use default stop words for turbomind backend in pipeline by irexyc in https://github.com/InternLM/lmdeploy/pull/1119
* Add input_token_len to Response and update Response document by AllentDan in https://github.com/InternLM/lmdeploy/pull/1115
๐Ÿž Bug fixes
* Fix fast tokenizer swallows prefix space when there are too many white spaces by AllentDan in https://github.com/InternLM/lmdeploy/pull/992
* Fix turbomind CUDA runtime error invalid argument by zhyncs in https://github.com/InternLM/lmdeploy/pull/1100
* Add safety check for incremental decode by AllentDan in https://github.com/InternLM/lmdeploy/pull/1094
* Fix device type of get_ppl for turbomind by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1093
* Fix pipeline init turbomind from workspace by irexyc in https://github.com/InternLM/lmdeploy/pull/1126
* Add dependency version check and fix `ignore_eos` logic by grimoire in https://github.com/InternLM/lmdeploy/pull/1099
* Change configuration_internlm.py to configuration_internlm2.py by HIT-cwh in https://github.com/InternLM/lmdeploy/pull/1129

๐Ÿ“š Documentations
* Update contribution guide by zhyncs in https://github.com/InternLM/lmdeploy/pull/1120
๐ŸŒ Other
* Bump version to v0.2.3 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1123

New Contributors
* yinfan98 made their first contribution in https://github.com/InternLM/lmdeploy/pull/1064

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.2.2...v0.2.3

0.2.2

<!-- Release notes generated using configuration in .github/release.yml at main -->

Highlight
English version
* The allocation strategy for k/v cache is changed. The parameter `cache_max_entry_count` defaults to 0.8. It means the proportion of GPU **FREE** memory rather than **TOTAL** memory. The default value is updated to 0.8. It can help prevent OOM issues.
* The pipeline API supports streaming inference. You may give it a try!
python
from lmdeploy import pipeline
pipe = pipeline('internlm/internlm2-chat-7b')
for item in pipe.stream_infer('hi, please intro yourself'):
print(item)

* Add api key and ssl to `api_server`
Chinese version
* TurboMind engine ไฟฎๆ”นไบ†GPU memoryๅˆ†้…็ญ–็•ฅใ€‚k/v cache ๅ†…ๅญ˜ๆฏ”ไพ‹ๅ‚ๆ•ฐ cache_max_entry_count ็ผบ็œๅ€ผๅ˜ๆ›ดไธบ 0.8ใ€‚ๅฎƒ่กจ็คบ GPU**็ฉบ้—ฒๅ†…ๅญ˜**็š„ๆฏ”ไพ‹๏ผŒไธๅ†ๆ˜ฏ GPU **ๆ€ปๅ†…ๅญ˜**็š„ๆฏ”ไพ‹ใ€‚
* Pipeline ๆ”ฏๆŒๆตๅผ่พ“ๅ‡บๆŽฅๅฃใ€‚ๅฏไปฅๅฐ่ฏ•ไธ‹ๅฆ‚ไธ‹ไปฃ็ ๏ผš
python
from lmdeploy import pipeline
pipe = pipeline('internlm/internlm2-chat-7b')
for item in pipe.stream_infer('hi, please intro yourself'):
print(item)

* api_server ๅœจๆŽฅๅฃไธญๅขžๅŠ ไบ† api_key


What's Changed
๐Ÿš€ Features
* add alignment tools by grimoire in https://github.com/InternLM/lmdeploy/pull/1004
* support min_length for turbomind backend by irexyc in https://github.com/InternLM/lmdeploy/pull/961
* Add stream mode function to pipeline by AllentDan in https://github.com/InternLM/lmdeploy/pull/974
* [Feature] Add api key and ssl to http server by AllentDan in https://github.com/InternLM/lmdeploy/pull/1048

๐Ÿ’ฅ Improvements
* hide stop-words in output text by grimoire in https://github.com/InternLM/lmdeploy/pull/991
* optimize sleep by grimoire in https://github.com/InternLM/lmdeploy/pull/1034
* set example values to /v1/chat/completions in swagger UI by AllentDan in https://github.com/InternLM/lmdeploy/pull/984
* Update adapters cli argument by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1039
* Fix turbomind end session bug. Add huggingface demo document by AllentDan in https://github.com/InternLM/lmdeploy/pull/1017
* Support linking the custom built mpi by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1025
* sync mem size for tp by lzhangzz in https://github.com/InternLM/lmdeploy/pull/1053
* Remove model name when loading hf model by irexyc in https://github.com/InternLM/lmdeploy/pull/1022
* support internlm2-1_8b by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1073
* Update chat template for internlm2 base model by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1079
๐Ÿž Bug fixes
* fix TorchEngine stuck when benchmarking with `tp>1` by grimoire in https://github.com/InternLM/lmdeploy/pull/942
* fix module mapping error of baichuan model by grimoire in https://github.com/InternLM/lmdeploy/pull/977
* fix import error for triton server by RunningLeon in https://github.com/InternLM/lmdeploy/pull/985
* fix qwen-vl example by irexyc in https://github.com/InternLM/lmdeploy/pull/996
* fix missing init file in modules by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1013
* fix tp mem usage by grimoire in https://github.com/InternLM/lmdeploy/pull/987
* update indexes_containing_token function by AllentDan in https://github.com/InternLM/lmdeploy/pull/1050
* fix flash kernel on sm 70 by grimoire in https://github.com/InternLM/lmdeploy/pull/1027
* Fix baichuan2 lora by grimoire in https://github.com/InternLM/lmdeploy/pull/1042
* Fix modelconfig in pytorch engine, support YI. by grimoire in https://github.com/InternLM/lmdeploy/pull/1052
* Fix repetition penalty for long context by irexyc in https://github.com/InternLM/lmdeploy/pull/1037
* [Fix] Support QLinear in rowwise_parallelize_linear_fn and colwise_parallelize_linear_fn by HIT-cwh in https://github.com/InternLM/lmdeploy/pull/1072
๐Ÿ“š Documentations
* add docs for evaluation with opencompass by RunningLeon in https://github.com/InternLM/lmdeploy/pull/995
* update docs for kvint8 by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1026
* [doc] Introduce project OpenAOE by JiaYingLii in https://github.com/InternLM/lmdeploy/pull/1049
* update pipeline guide and FAQ about OOM by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1051
* docs update cache_max_entry_count for turbomind config by zhyncs in https://github.com/InternLM/lmdeploy/pull/1067
๐ŸŒ Other
* update ut ci to new server node by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1024
* Ete testcase update by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1023
* fix OOM in BlockManager by zhyncs in https://github.com/InternLM/lmdeploy/pull/973
* fix use engine_config.tp when tp is None by zhyncs in https://github.com/InternLM/lmdeploy/pull/1057
* Fix serve api by moving logger inside process for turbomind by AllentDan in https://github.com/InternLM/lmdeploy/pull/1061
* bump version to v0.2.2 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1076

New Contributors
* zhyncs made their first contribution in https://github.com/InternLM/lmdeploy/pull/973
* JiaYingLii made their first contribution in https://github.com/InternLM/lmdeploy/pull/1049

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.2.1...v0.2.2

0.2.1

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
๐Ÿ’ฅ Improvements
* [Fix] interlm2 chat format by Harold-lkk in https://github.com/InternLM/lmdeploy/pull/1002
๐Ÿž Bug fixes
* fix baichuan2 conversion by AllentDan in https://github.com/InternLM/lmdeploy/pull/972
* [Fix] interlm messages2prompt by Harold-lkk in https://github.com/InternLM/lmdeploy/pull/1003
๐Ÿ“š Documentations
* add guide about installation on cuda 12+ platform by lvhan028 in https://github.com/InternLM/lmdeploy/pull/988
๐ŸŒ Other
* bump version to v0.2.1 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1005


**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.2.0...v0.2.1

Page 5 of 8

ยฉ 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.