Lmdeploy

Latest version: v0.7.1

Safety actively analyzes 714792 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 8

0.7.1

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
🚀 Features
* support release pipeline by irexyc in https://github.com/InternLM/lmdeploy/pull/3069
* [feature] add dlinfer w8a8 support. by Reinerzhou in https://github.com/InternLM/lmdeploy/pull/2988
* [maca] support deepseekv2 for maca backend. by Reinerzhou in https://github.com/InternLM/lmdeploy/pull/2918
* [Feature] support deepseek-vl2 for pytorch engine by CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3149
💥 Improvements
* use weights iterator while loading by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2886
* Add deepseek-r1 chat template by AllentDan in https://github.com/InternLM/lmdeploy/pull/3072
* Update tokenizer by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3061
* Set max concurrent requests by AllentDan in https://github.com/InternLM/lmdeploy/pull/2961
* remove logitswarper by grimoire in https://github.com/InternLM/lmdeploy/pull/3109
* Update benchmark script and user guide by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3110
* support eos_token list in turbomind by irexyc in https://github.com/InternLM/lmdeploy/pull/3044
* Use aiohttp inside proxy server && add --disable-cache-status argument by AllentDan in https://github.com/InternLM/lmdeploy/pull/3020
* Update runtime package dependencies by zgjja in https://github.com/InternLM/lmdeploy/pull/3142
* Make turbomind support embedding inputs on GPU by chengyuma in https://github.com/InternLM/lmdeploy/pull/3177
🐞 Bug fixes
* [dlinfer] fix ascend qwen2_vl graph_mode by yao-fengchen in https://github.com/InternLM/lmdeploy/pull/3045
* fix error in interactive api by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3074
* fix sliding window mgr by grimoire in https://github.com/InternLM/lmdeploy/pull/3068
* More arguments in api_client, update docstrings by AllentDan in https://github.com/InternLM/lmdeploy/pull/3077
* Add system role to deepseek chat template by AllentDan in https://github.com/InternLM/lmdeploy/pull/3031
* Fix xcomposer2d5 by irexyc in https://github.com/InternLM/lmdeploy/pull/3087
* fix user guide about cogvlm deployment by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3088
* fix postional argument by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3086
* Fix UT of deepseek chat template by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3125
* Fix internvl2.5 error after eviction by grimoire in https://github.com/InternLM/lmdeploy/pull/3122
* Fix cogvlm and phi3vision by RunningLeon in https://github.com/InternLM/lmdeploy/pull/3137
* [fix] fix vl gradio, use pipeline api and remove interactive chat by irexyc in https://github.com/InternLM/lmdeploy/pull/3136
* fix the issue that stop_token may be less than defined in model.py by irexyc in https://github.com/InternLM/lmdeploy/pull/3148
* fix typing by lz1998 in https://github.com/InternLM/lmdeploy/pull/3153
* fix min length penalty by irexyc in https://github.com/InternLM/lmdeploy/pull/3150
* fix default temperature value by irexyc in https://github.com/InternLM/lmdeploy/pull/3166
* Use pad_token_id as image_token_id for vl models by RunningLeon in https://github.com/InternLM/lmdeploy/pull/3158
* Fix tool call prompt for InternLM and Qwen by AllentDan in https://github.com/InternLM/lmdeploy/pull/3156
* Update qwen2.py by GxjGit in https://github.com/InternLM/lmdeploy/pull/3174
* fix temperature=0 by grimoire in https://github.com/InternLM/lmdeploy/pull/3176
* fix blocked fp8 moe by grimoire in https://github.com/InternLM/lmdeploy/pull/3181
* fix deepseekv2 has no attribute use_mla error by CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3188
* fix unstoppable chat by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3189
🌐 Other
* [ci] add internlm3 into testcase by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/3038
* add internlm3 to supported models by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3041
* update pre-commit config by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2683
* [maca] add cudagraph support on maca backend. by Reinerzhou in https://github.com/InternLM/lmdeploy/pull/2834
* bump version to v0.7.0.post1 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3076
* bump version to v0.7.0.post2 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3094
* [Fix] fix the URL judgment problem in Windows by Lychee-acaca in https://github.com/InternLM/lmdeploy/pull/3103
* bump version to v0.7.0.post3 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3115
* [ci] fix some fail in daily testcase by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/3134
* Bump version to v0.7.1 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3178

New Contributors
* Lychee-acaca made their first contribution in https://github.com/InternLM/lmdeploy/pull/3103
* lz1998 made their first contribution in https://github.com/InternLM/lmdeploy/pull/3153
* GxjGit made their first contribution in https://github.com/InternLM/lmdeploy/pull/3174
* chengyuma made their first contribution in https://github.com/InternLM/lmdeploy/pull/3177
* CUHKSZzxy made their first contribution in https://github.com/InternLM/lmdeploy/pull/3149

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.7.0...v0.7.1

0.7.0.post3

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
💥 Improvements
* Set max concurrent requests by AllentDan in https://github.com/InternLM/lmdeploy/pull/2961
* remove logitswarper by grimoire in https://github.com/InternLM/lmdeploy/pull/3109
🐞 Bug fixes
* fix user guide about cogvlm deployment by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3088
* fix postional argument by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3086
🌐 Other
* [Fix] fix the URL judgment problem in Windows by Lychee-acaca in https://github.com/InternLM/lmdeploy/pull/3103
* bump version to v0.7.0.post3 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3115

New Contributors
* Lychee-acaca made their first contribution in https://github.com/InternLM/lmdeploy/pull/3103

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.7.0.post2...v0.7.0.post3

0.7.0.post2

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
💥 Improvements
* Add deepseek-r1 chat template by AllentDan in https://github.com/InternLM/lmdeploy/pull/3072
* Update tokenizer by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3061
🐞 Bug fixes
* Add system role to deepseek chat template by AllentDan in https://github.com/InternLM/lmdeploy/pull/3031
* Fix xcomposer2d5 by irexyc in https://github.com/InternLM/lmdeploy/pull/3087
🌐 Other
* bump version to v0.7.0.post2 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3094


**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.7.0.post1...v0.7.0.post2

0.7.0.post1

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
💥 Improvements
* use weights iterator while loading by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2886
🐞 Bug fixes
* [dlinfer] fix ascend qwen2_vl graph_mode by yao-fengchen in https://github.com/InternLM/lmdeploy/pull/3045
* fix error in interactive api by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3074
* fix sliding window mgr by grimoire in https://github.com/InternLM/lmdeploy/pull/3068
* More arguments in api_client, update docstrings by AllentDan in https://github.com/InternLM/lmdeploy/pull/3077
🌐 Other
* [ci] add internlm3 into testcase by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/3038
* add internlm3 to supported models by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3041
* update pre-commit config by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2683
* [maca] add cudagraph support on maca backend. by Reinerzhou in https://github.com/InternLM/lmdeploy/pull/2834
* bump version to v0.7.0.post1 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3076


**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.7.0...v0.7.0.post1

0.7.0

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
🚀 Features
* Support moe w8a8 in pytorch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/2894
* Support DeepseekV3 fp8 by grimoire in https://github.com/InternLM/lmdeploy/pull/2967
* support new backend cambricon by JackWeiw in https://github.com/InternLM/lmdeploy/pull/3002
* support-moe-fp8 by RunningLeon in https://github.com/InternLM/lmdeploy/pull/3007
* add internlm3-dense(turbomind) & chat template by irexyc in https://github.com/InternLM/lmdeploy/pull/3024
* support internlm3 on pt by RunningLeon in https://github.com/InternLM/lmdeploy/pull/3026
* Support internlm3 quantization by AllentDan in https://github.com/InternLM/lmdeploy/pull/3027
💥 Improvements
* Optimize awq kernel in pytorch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/2965
* Support fp8 w8a8 for pt backend by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2959
* Optimize lora kernel by grimoire in https://github.com/InternLM/lmdeploy/pull/2975
* Remove threadsafe by grimoire in https://github.com/InternLM/lmdeploy/pull/2907
* Refactor async engine & turbomind IO by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2968
* [dlinfer]rope refine by JackWeiw in https://github.com/InternLM/lmdeploy/pull/2984
* Expose spaces_between_special_tokens by AllentDan in https://github.com/InternLM/lmdeploy/pull/2991
* [dlinfer]change llm op interface of paged_prefill_attention. by JackWeiw in https://github.com/InternLM/lmdeploy/pull/2977
* Update request logger by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2981
* remove decoding by grimoire in https://github.com/InternLM/lmdeploy/pull/3016
🐞 Bug fixes
* Fix build crash in nvcr.io/nvidia/pytorch:24.06-py3 image by zgjja in https://github.com/InternLM/lmdeploy/pull/2964
* add tool role in BaseChatTemplate as tool response in messages by AllentDan in https://github.com/InternLM/lmdeploy/pull/2979
* Fix ascend dockerfile by jinminxi104 in https://github.com/InternLM/lmdeploy/pull/2989
* fix internvl2 qk norm by grimoire in https://github.com/InternLM/lmdeploy/pull/2987
* fix xcomposer2 when transformers is upgraded greater than 4.46 by irexyc in https://github.com/InternLM/lmdeploy/pull/3001
* Fix get_ppl & get_logits by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3008
* Fix typo in w4a16 guide by Yan-Xiangjun in https://github.com/InternLM/lmdeploy/pull/3018
* fix blocked fp8 moe kernel by grimoire in https://github.com/InternLM/lmdeploy/pull/3009
* Fix async engine by lzhangzz in https://github.com/InternLM/lmdeploy/pull/3029
* [hotfix] Fix get_ppl by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3023
* Fix MoE gating for DeepSeek V2 by lzhangzz in https://github.com/InternLM/lmdeploy/pull/3030
* Fix empty response for pipeline by lzhangzz in https://github.com/InternLM/lmdeploy/pull/3034
* Fix potential hang during TP model initialization by lzhangzz in https://github.com/InternLM/lmdeploy/pull/3033
🌐 Other
* [ci] add w8a8 and internvl2.5 models into testcase by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2949
* bump version to v0.7.0 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3010

New Contributors
* zgjja made their first contribution in https://github.com/InternLM/lmdeploy/pull/2964
* Yan-Xiangjun made their first contribution in https://github.com/InternLM/lmdeploy/pull/3018

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/0.6.5...v0.7.0

0.6.5

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
🚀 Features
* [dlinfer] feat: add DlinferFlashAttention to support qwen vl. by Reinerzhou in https://github.com/InternLM/lmdeploy/pull/2952
💥 Improvements
* refactor PyTorchEngine check env by grimoire in https://github.com/InternLM/lmdeploy/pull/2870
* refine multi-backend setup.py by jinminxi104 in https://github.com/InternLM/lmdeploy/pull/2880
* Refactor VLM modules by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2810
* [dlinfer] only compile the language model in vl models by tangzhiyi11 in https://github.com/InternLM/lmdeploy/pull/2893
* Optimize tp broadcast by grimoire in https://github.com/InternLM/lmdeploy/pull/2889
* unfeeze torch version in dockerfile by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2906
* support tp > n_kv_heads for pt engine by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2872
* replicate kv for some models when tp is divisble by kv_head_num by irexyc in https://github.com/InternLM/lmdeploy/pull/2874
* Fallback to pytorch engine when the model is quantized by smooth quant by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2953
* Torchrun launching multiple api_server by AllentDan in https://github.com/InternLM/lmdeploy/pull/2402
🐞 Bug fixes
* [Feature] Support for loading lora adapter weights in safetensors format by Galaxy-Husky in https://github.com/InternLM/lmdeploy/pull/2860
* fix cpu cache by grimoire in https://github.com/InternLM/lmdeploy/pull/2881
* Fix args type in docstring by Galaxy-Husky in https://github.com/InternLM/lmdeploy/pull/2888
* Fix llama3.1 chat template by fzyzcjy in https://github.com/InternLM/lmdeploy/pull/2862
* Fix typo by ghntd in https://github.com/InternLM/lmdeploy/pull/2916
* fix: Incorrect stats size during inference of throughput benchmark when concurrency > num_prompts by pancak3 in https://github.com/InternLM/lmdeploy/pull/2928
* fix lora name and rearange wqkv for internlm2 by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2912
* [dlinfer] fix moe op for dlinfer. by Reinerzhou in https://github.com/InternLM/lmdeploy/pull/2917
* [side effect] fix vlm quant failed by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2914
* fix torch_dtype by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2933
* support unaligned qkv heads by grimoire in https://github.com/InternLM/lmdeploy/pull/2930
* fix mllama inference without image by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2947
* Support torch_dtype modification and update FAQs for AWQ quantization by AllentDan in https://github.com/InternLM/lmdeploy/pull/2898
* Fix exception handler for proxy server by AllentDan in https://github.com/InternLM/lmdeploy/pull/2901
* Fix torch_dtype in lite by AllentDan in https://github.com/InternLM/lmdeploy/pull/2956
* [side-effect] bring back quantization of qwen2-vl, glm4v and etc. by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2954
* add a thread pool executor to control the vl engine traffic by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2970
* [side-effect] fix gradio demo error by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2976
🌐 Other
* [dlinfer] fix engine checker by tangzhiyi11 in https://github.com/InternLM/lmdeploy/pull/2891
* Bump version to v0.6.5 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2955

New Contributors
* Galaxy-Husky made their first contribution in https://github.com/InternLM/lmdeploy/pull/2860
* fzyzcjy made their first contribution in https://github.com/InternLM/lmdeploy/pull/2862
* ghntd made their first contribution in https://github.com/InternLM/lmdeploy/pull/2916
* pancak3 made their first contribution in https://github.com/InternLM/lmdeploy/pull/2928

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.6.4...0.6.5

Page 1 of 8

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.