Lmdeploy

Latest version: v0.7.0.post3

Safety actively analyzes 708039 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 8

0.7.0.post3

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
💥 Improvements
* Set max concurrent requests by AllentDan in https://github.com/InternLM/lmdeploy/pull/2961
* remove logitswarper by grimoire in https://github.com/InternLM/lmdeploy/pull/3109
🐞 Bug fixes
* fix user guide about cogvlm deployment by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3088
* fix postional argument by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3086
🌐 Other
* [Fix] fix the URL judgment problem in Windows by Lychee-acaca in https://github.com/InternLM/lmdeploy/pull/3103
* bump version to v0.7.0.post3 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3115

New Contributors
* Lychee-acaca made their first contribution in https://github.com/InternLM/lmdeploy/pull/3103

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.7.0.post2...v0.7.0.post3

0.7.0.post2

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
💥 Improvements
* Add deepseek-r1 chat template by AllentDan in https://github.com/InternLM/lmdeploy/pull/3072
* Update tokenizer by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3061
🐞 Bug fixes
* Add system role to deepseek chat template by AllentDan in https://github.com/InternLM/lmdeploy/pull/3031
* Fix xcomposer2d5 by irexyc in https://github.com/InternLM/lmdeploy/pull/3087
🌐 Other
* bump version to v0.7.0.post2 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3094


**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.7.0.post1...v0.7.0.post2

0.7.0.post1

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
💥 Improvements
* use weights iterator while loading by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2886
🐞 Bug fixes
* [dlinfer] fix ascend qwen2_vl graph_mode by yao-fengchen in https://github.com/InternLM/lmdeploy/pull/3045
* fix error in interactive api by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3074
* fix sliding window mgr by grimoire in https://github.com/InternLM/lmdeploy/pull/3068
* More arguments in api_client, update docstrings by AllentDan in https://github.com/InternLM/lmdeploy/pull/3077
🌐 Other
* [ci] add internlm3 into testcase by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/3038
* add internlm3 to supported models by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3041
* update pre-commit config by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2683
* [maca] add cudagraph support on maca backend. by Reinerzhou in https://github.com/InternLM/lmdeploy/pull/2834
* bump version to v0.7.0.post1 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3076


**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.7.0...v0.7.0.post1

0.7.0

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
🚀 Features
* Support moe w8a8 in pytorch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/2894
* Support DeepseekV3 fp8 by grimoire in https://github.com/InternLM/lmdeploy/pull/2967
* support new backend cambricon by JackWeiw in https://github.com/InternLM/lmdeploy/pull/3002
* support-moe-fp8 by RunningLeon in https://github.com/InternLM/lmdeploy/pull/3007
* add internlm3-dense(turbomind) & chat template by irexyc in https://github.com/InternLM/lmdeploy/pull/3024
* support internlm3 on pt by RunningLeon in https://github.com/InternLM/lmdeploy/pull/3026
* Support internlm3 quantization by AllentDan in https://github.com/InternLM/lmdeploy/pull/3027
💥 Improvements
* Optimize awq kernel in pytorch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/2965
* Support fp8 w8a8 for pt backend by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2959
* Optimize lora kernel by grimoire in https://github.com/InternLM/lmdeploy/pull/2975
* Remove threadsafe by grimoire in https://github.com/InternLM/lmdeploy/pull/2907
* Refactor async engine & turbomind IO by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2968
* [dlinfer]rope refine by JackWeiw in https://github.com/InternLM/lmdeploy/pull/2984
* Expose spaces_between_special_tokens by AllentDan in https://github.com/InternLM/lmdeploy/pull/2991
* [dlinfer]change llm op interface of paged_prefill_attention. by JackWeiw in https://github.com/InternLM/lmdeploy/pull/2977
* Update request logger by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2981
* remove decoding by grimoire in https://github.com/InternLM/lmdeploy/pull/3016
🐞 Bug fixes
* Fix build crash in nvcr.io/nvidia/pytorch:24.06-py3 image by zgjja in https://github.com/InternLM/lmdeploy/pull/2964
* add tool role in BaseChatTemplate as tool response in messages by AllentDan in https://github.com/InternLM/lmdeploy/pull/2979
* Fix ascend dockerfile by jinminxi104 in https://github.com/InternLM/lmdeploy/pull/2989
* fix internvl2 qk norm by grimoire in https://github.com/InternLM/lmdeploy/pull/2987
* fix xcomposer2 when transformers is upgraded greater than 4.46 by irexyc in https://github.com/InternLM/lmdeploy/pull/3001
* Fix get_ppl & get_logits by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3008
* Fix typo in w4a16 guide by Yan-Xiangjun in https://github.com/InternLM/lmdeploy/pull/3018
* fix blocked fp8 moe kernel by grimoire in https://github.com/InternLM/lmdeploy/pull/3009
* Fix async engine by lzhangzz in https://github.com/InternLM/lmdeploy/pull/3029
* [hotfix] Fix get_ppl by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3023
* Fix MoE gating for DeepSeek V2 by lzhangzz in https://github.com/InternLM/lmdeploy/pull/3030
* Fix empty response for pipeline by lzhangzz in https://github.com/InternLM/lmdeploy/pull/3034
* Fix potential hang during TP model initialization by lzhangzz in https://github.com/InternLM/lmdeploy/pull/3033
🌐 Other
* [ci] add w8a8 and internvl2.5 models into testcase by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2949
* bump version to v0.7.0 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3010

New Contributors
* zgjja made their first contribution in https://github.com/InternLM/lmdeploy/pull/2964
* Yan-Xiangjun made their first contribution in https://github.com/InternLM/lmdeploy/pull/3018

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/0.6.5...v0.7.0

0.6.5

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
🚀 Features
* [dlinfer] feat: add DlinferFlashAttention to support qwen vl. by Reinerzhou in https://github.com/InternLM/lmdeploy/pull/2952
💥 Improvements
* refactor PyTorchEngine check env by grimoire in https://github.com/InternLM/lmdeploy/pull/2870
* refine multi-backend setup.py by jinminxi104 in https://github.com/InternLM/lmdeploy/pull/2880
* Refactor VLM modules by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2810
* [dlinfer] only compile the language model in vl models by tangzhiyi11 in https://github.com/InternLM/lmdeploy/pull/2893
* Optimize tp broadcast by grimoire in https://github.com/InternLM/lmdeploy/pull/2889
* unfeeze torch version in dockerfile by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2906
* support tp > n_kv_heads for pt engine by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2872
* replicate kv for some models when tp is divisble by kv_head_num by irexyc in https://github.com/InternLM/lmdeploy/pull/2874
* Fallback to pytorch engine when the model is quantized by smooth quant by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2953
* Torchrun launching multiple api_server by AllentDan in https://github.com/InternLM/lmdeploy/pull/2402
🐞 Bug fixes
* [Feature] Support for loading lora adapter weights in safetensors format by Galaxy-Husky in https://github.com/InternLM/lmdeploy/pull/2860
* fix cpu cache by grimoire in https://github.com/InternLM/lmdeploy/pull/2881
* Fix args type in docstring by Galaxy-Husky in https://github.com/InternLM/lmdeploy/pull/2888
* Fix llama3.1 chat template by fzyzcjy in https://github.com/InternLM/lmdeploy/pull/2862
* Fix typo by ghntd in https://github.com/InternLM/lmdeploy/pull/2916
* fix: Incorrect stats size during inference of throughput benchmark when concurrency > num_prompts by pancak3 in https://github.com/InternLM/lmdeploy/pull/2928
* fix lora name and rearange wqkv for internlm2 by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2912
* [dlinfer] fix moe op for dlinfer. by Reinerzhou in https://github.com/InternLM/lmdeploy/pull/2917
* [side effect] fix vlm quant failed by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2914
* fix torch_dtype by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2933
* support unaligned qkv heads by grimoire in https://github.com/InternLM/lmdeploy/pull/2930
* fix mllama inference without image by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2947
* Support torch_dtype modification and update FAQs for AWQ quantization by AllentDan in https://github.com/InternLM/lmdeploy/pull/2898
* Fix exception handler for proxy server by AllentDan in https://github.com/InternLM/lmdeploy/pull/2901
* Fix torch_dtype in lite by AllentDan in https://github.com/InternLM/lmdeploy/pull/2956
* [side-effect] bring back quantization of qwen2-vl, glm4v and etc. by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2954
* add a thread pool executor to control the vl engine traffic by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2970
* [side-effect] fix gradio demo error by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2976
🌐 Other
* [dlinfer] fix engine checker by tangzhiyi11 in https://github.com/InternLM/lmdeploy/pull/2891
* Bump version to v0.6.5 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2955

New Contributors
* Galaxy-Husky made their first contribution in https://github.com/InternLM/lmdeploy/pull/2860
* fzyzcjy made their first contribution in https://github.com/InternLM/lmdeploy/pull/2862
* ghntd made their first contribution in https://github.com/InternLM/lmdeploy/pull/2916
* pancak3 made their first contribution in https://github.com/InternLM/lmdeploy/pull/2928

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.6.4...0.6.5

0.6.4

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
🚀 Features
* feature: support qwen2.5 fuction_call by akai-shuuichi in https://github.com/InternLM/lmdeploy/pull/2737
* [Feature] support minicpm-v_2_6 for pytorch engine. by Reinerzhou in https://github.com/InternLM/lmdeploy/pull/2767
* Support qwen2-vl AWQ quantization by AllentDan in https://github.com/InternLM/lmdeploy/pull/2787
* Add DeepSeek-V2 support by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2763
* [ascend]feat: support kv int8 by yao-fengchen in https://github.com/InternLM/lmdeploy/pull/2736
💥 Improvements
* Optimize update_step_ctx on Ascend by jinminxi104 in https://github.com/InternLM/lmdeploy/pull/2804
* Add Ascend installation adapter by zhabuye in https://github.com/InternLM/lmdeploy/pull/2817
* Refactor turbomind (2/N) by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2818
* add openssh-server installation in dockerfile by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2830
* Add version restrictions in runtime_ascend.txt to ensure functionality by zhabuye in https://github.com/InternLM/lmdeploy/pull/2836
* better kv allocate by grimoire in https://github.com/InternLM/lmdeploy/pull/2814
* Update internvl chat template by AllentDan in https://github.com/InternLM/lmdeploy/pull/2832
* profile throughput without new threads by grimoire in https://github.com/InternLM/lmdeploy/pull/2826
* [dlinfer] change dlinfer kv_cache layout and ajust paged_prefill_attention api. by Reinerzhou in https://github.com/InternLM/lmdeploy/pull/2847
* [maca] add env to support different mm layout on maca. by Reinerzhou in https://github.com/InternLM/lmdeploy/pull/2835
* Supports W8A8 quantization for more models by AllentDan in https://github.com/InternLM/lmdeploy/pull/2850
🐞 Bug fixes
* disable prefix-caching for vl model by grimoire in https://github.com/InternLM/lmdeploy/pull/2825
* Fix gemma2 accuracy through the correct softcapping logic by AllentDan in https://github.com/InternLM/lmdeploy/pull/2842
* fix accessing before initialization by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2845
* fix the logic to verify whether AutoAWQ has been successfully installed by grimoire in https://github.com/InternLM/lmdeploy/pull/2844
* check whether backend_config is None or not before accessing its attr by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2848
* [ascend] convert kv cache to nd format in ascend graph mode by tangzhiyi11 in https://github.com/InternLM/lmdeploy/pull/2853
📚 Documentations
* Update supported models & Ascend doc by jinminxi104 in https://github.com/InternLM/lmdeploy/pull/2765
* update supported models by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2849
🌐 Other
* [CI] Split vl testcases into turbomind and pytorch backend by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2751
* [dlinfer] Fix qwenvl rope error for dlinfer backend by JackWeiw in https://github.com/InternLM/lmdeploy/pull/2795
* [CI] add more testcase for mllm models by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2791
* Update dlinfer-ascend version in runtime_ascend.txt by jinminxi104 in https://github.com/InternLM/lmdeploy/pull/2865
* bump version to v0.6.4 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2864

New Contributors
* akai-shuuichi made their first contribution in https://github.com/InternLM/lmdeploy/pull/2737
* JackWeiw made their first contribution in https://github.com/InternLM/lmdeploy/pull/2795
* zhabuye made their first contribution in https://github.com/InternLM/lmdeploy/pull/2817

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.6.3...v0.6.4

Page 1 of 8

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.