Lmdeploy

Latest version: v0.7.1

Safety actively analyzes 714973 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 8

0.6.0a0

<!-- Release notes generated using configuration in .github/release.yml at main -->
Highlight
- Optimize W4A16 quantized model inference by implementing GEMM in TurboMind Engine
- Add GPTQ-INT4 inference
- Support CUDA architecture from SM70 and above, equivalent to the V100 and above.
- Optimize the prefilling inference stage of PyTorchEngine
- Distinguish between the concepts of the name of the deployed model and the name of the model's chat tempate

Before:
shell
lmdeploy serve api_server /the/path/of/your/awesome/model \
--model-name customized_chat_template.json

After
shell
lmdeploy serve api_server /the/path/of/your/awesome/model \
--model-name "the served model name"
--chat-template customized_chat_template.json


What's Changed
🚀 Features
* support vlm custom image process parameters in openai input format by irexyc in https://github.com/InternLM/lmdeploy/pull/2245
* New GEMM kernels for weight-only quantization by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2090
* Fix hidden size and support mistral nemo by AllentDan in https://github.com/InternLM/lmdeploy/pull/2215
* Support custom logits processors by AllentDan in https://github.com/InternLM/lmdeploy/pull/2329
* support openbmb/MiniCPM-V-2_6 by irexyc in https://github.com/InternLM/lmdeploy/pull/2351
* Support phi3.5 for pytorch engine by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2361
💥 Improvements
* Remove deprecated arguments from API and clarify model_name and chat_template_name by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1931
* Fix duplicated session_id when pipeline is used by multithreads by irexyc in https://github.com/InternLM/lmdeploy/pull/2134
* remove eviction param by grimoire in https://github.com/InternLM/lmdeploy/pull/2285
* Remove QoS serving by AllentDan in https://github.com/InternLM/lmdeploy/pull/2294
* Support send tool_calls back to internlm2 by AllentDan in https://github.com/InternLM/lmdeploy/pull/2147
* Add stream options to control usage by AllentDan in https://github.com/InternLM/lmdeploy/pull/2313
* add device type for pytorch engine in cli by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2321
* Update error status_code to raise error in openai client by AllentDan in https://github.com/InternLM/lmdeploy/pull/2333
* Change to use device instead of device-type in cli by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2337
* Add GEMM test utils by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2342
* Add environment variable to control SILU fusion by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2343
* Use single thread per model instance by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2339
* add cache to speed up docker building by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2344
* add max_prefill_token_num argument in CLI by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2345
* torch engine optimize prefill for long context by grimoire in https://github.com/InternLM/lmdeploy/pull/1962
* Refactor turbomind (1/N) by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2352
* feat(server): enable `seed` parameter for openai compatible server. by DearPlanet in https://github.com/InternLM/lmdeploy/pull/2353
🐞 Bug fixes
* enable run vlm with pytorch engine in gradio by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2256
* fix side-effect: failed to update tm model config with tm engine config by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2275
* Fix internvl2 template and update docs by irexyc in https://github.com/InternLM/lmdeploy/pull/2292
* fix the issue missing dependencies in the Dockerfile and pip by ColorfulDick in https://github.com/InternLM/lmdeploy/pull/2240
* Fix the way to get "quantization_config" from model's coniguration by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2325
* fix(ascend): fix import error of pt engine in cli by CyCle1024 in https://github.com/InternLM/lmdeploy/pull/2328
* Default rope_scaling_factor of TurbomindEngineConfig to None by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2358
* Fix the logic of update engine_config to TurbomindModelConfig for both tm model and hf model by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2362
📚 Documentations
* Reorganize the user guide and update the get_started section by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2038
* cancel support baichuan2 7b awq in pytorch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/2246
* Add user guide about slora serving by AllentDan in https://github.com/InternLM/lmdeploy/pull/2084
🌐 Other
* test prtest image update by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2192
* Update python support version by wuhongsheng in https://github.com/InternLM/lmdeploy/pull/2290
* fix Windows compile error by zhyncs in https://github.com/InternLM/lmdeploy/pull/2303
* fix: follow up 2303 by zhyncs in https://github.com/InternLM/lmdeploy/pull/2307
* [ci] benchmark react by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2183
* bump version to v0.6.0a0 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2371

New Contributors
* wuhongsheng made their first contribution in https://github.com/InternLM/lmdeploy/pull/2290
* ColorfulDick made their first contribution in https://github.com/InternLM/lmdeploy/pull/2240
* DearPlanet made their first contribution in https://github.com/InternLM/lmdeploy/pull/2353

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.5.3...v0.6.0a0

0.5.3

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
🚀 Features
* PyTorch Engine AWQ support by grimoire in https://github.com/InternLM/lmdeploy/pull/1913
* Phi3 awq by grimoire in https://github.com/InternLM/lmdeploy/pull/1984
* Fix chunked prefill by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2201
* support VLMs with Qwen as the language model by irexyc in https://github.com/InternLM/lmdeploy/pull/2207
💥 Improvements
* Support specifying a prefix of assistant response by AllentDan in https://github.com/InternLM/lmdeploy/pull/2172
* Strict check for `name_map` in `InternLM2Chat7B` by SamuraiBUPT in https://github.com/InternLM/lmdeploy/pull/2156
* Check errors for attention kernels by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2206
* update base image to support cuda12.4 in dockerfile by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2182
* Stop synchronizing for `length_criterion` by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2202
* adapt MiniCPM-Llama3-V-2_5 new code by irexyc in https://github.com/InternLM/lmdeploy/pull/2139
* Remove duplicate code by cmpute in https://github.com/InternLM/lmdeploy/pull/2133
🐞 Bug fixes
* [Hotfix] miss parentheses when calcuating the coef of llama3 rope by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2157
* support logit softcap by grimoire in https://github.com/InternLM/lmdeploy/pull/2158
* Fix gmem to smem WAW conflict in awq gemm kernel by foreverrookie in https://github.com/InternLM/lmdeploy/pull/2111
* Fix gradio serve using a wrong chat template by AllentDan in https://github.com/InternLM/lmdeploy/pull/2131
* fix runtime error when using dynamic scale rotary embed for InternLM2… by CyCle1024 in https://github.com/InternLM/lmdeploy/pull/2212
* Add peer-access-enabled allocator by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2218
* Fix typos in profile_generation.py by jiajie-yang in https://github.com/InternLM/lmdeploy/pull/2233
📚 Documentations
* docs: fix Qwen typo by ArtificialZeng in https://github.com/InternLM/lmdeploy/pull/2136
* wrong expression by ArtificialZeng in https://github.com/InternLM/lmdeploy/pull/2165
* clearify the model type LLM or MLLM in supported model matrix by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2209
* docs: add Japanese README by eltociear in https://github.com/InternLM/lmdeploy/pull/2237
🌐 Other
* bump version to 0.5.2.post1 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2159
* update news about cooperation with modelscope/swift by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2200
* bump version to v0.5.3 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2242

New Contributors
* ArtificialZeng made their first contribution in https://github.com/InternLM/lmdeploy/pull/2136
* foreverrookie made their first contribution in https://github.com/InternLM/lmdeploy/pull/2111
* SamuraiBUPT made their first contribution in https://github.com/InternLM/lmdeploy/pull/2156
* CyCle1024 made their first contribution in https://github.com/InternLM/lmdeploy/pull/2212
* jiajie-yang made their first contribution in https://github.com/InternLM/lmdeploy/pull/2233
* cmpute made their first contribution in https://github.com/InternLM/lmdeploy/pull/2133

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.5.2...v0.5.3

0.5.2.post1

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
🐞 Bug fixes
* [Hotfix] miss parentheses when calcuating the coef of llama3 rope which causes needle-in-hays experiment failed by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2157
🌐 Other
* bump version to 0.5.2.post1 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2159


**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.5.2...v0.5.2.post1

0.5.2

<!-- Release notes generated using configuration in .github/release.yml at main -->

Highlight

- LMDeploy support Llama3.1 and its **Tool Calling**. An example of calling "Wolfram Alpha" to perform complex mathematical calculations can be found from [here](https://github.com/InternLM/lmdeploy/blob/main/docs/en/serving/api_server_tools.md)

What's Changed
🚀 Features
* Support glm4 awq by AllentDan in https://github.com/InternLM/lmdeploy/pull/1993
* Support llama3.1 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2122
* Support Llama3.1 tool calling by AllentDan in https://github.com/InternLM/lmdeploy/pull/2123
💥 Improvements
* Remove the triton inference server backend "turbomind_backend" by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1986
* Remove kv cache offline quantization by AllentDan in https://github.com/InternLM/lmdeploy/pull/2097
* Remove `session_len` and deprecated short names of the chat templates by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2105
* clarify "n>1" in GenerationConfig hasn't been supported yet by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2108
🐞 Bug fixes
* fix stop words for glm4 by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2044
* Disable peer access code by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2082
* set log level ERROR in benchmark scripts by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2086
* raise thread exception by irexyc in https://github.com/InternLM/lmdeploy/pull/2071
* Fix index error when profiling token generation with `-ct 1` by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1898
🌐 Other
* misc: replace slow Jimver/cuda-toolkit by zhyncs in https://github.com/InternLM/lmdeploy/pull/2065
* misc: update bug issue template by zhyncs in https://github.com/InternLM/lmdeploy/pull/2083
* update daily testcase new by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2035
* bump version to v0.5.2 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2143


**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.5.1...v0.5.2

0.5.1

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
🚀 Features
* Support phi3-vision by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1845
* Support internvl2 chat template by AllentDan in https://github.com/InternLM/lmdeploy/pull/1911
* support gemma2 in pytorch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/1924
* Add tools to api_server for InternLM2 model by AllentDan in https://github.com/InternLM/lmdeploy/pull/1763
* support internvl2-1b by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1983
* feat: support llama2 and internlm2 on 910B by yao-fengchen in https://github.com/InternLM/lmdeploy/pull/2011
* Support glm 4v by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1947
* support internlm-xcomposer2d5-7b by irexyc in https://github.com/InternLM/lmdeploy/pull/1932
* add chat template for codegeex4 by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2013
💥 Improvements
* misc: rm unnecessary files by zhyncs in https://github.com/InternLM/lmdeploy/pull/1875
* drop stop words by grimoire in https://github.com/InternLM/lmdeploy/pull/1823
* Add usage in stream response by fbzhong in https://github.com/InternLM/lmdeploy/pull/1876
* Optimize sampling on pytorch engine. by grimoire in https://github.com/InternLM/lmdeploy/pull/1853
* Remove deprecated chat cli and vl examples by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1899
* vision model use tp number of gpu by irexyc in https://github.com/InternLM/lmdeploy/pull/1854
* misc: add default api_server_url for api_client by zhyncs in https://github.com/InternLM/lmdeploy/pull/1922
* misc: add transformers version check for TurboMind Tokenizer by zhyncs in https://github.com/InternLM/lmdeploy/pull/1917
* fix: append _stats when size > 0 by zhyncs in https://github.com/InternLM/lmdeploy/pull/1809
* refactor: update awq linear and rm legacy by zhyncs in https://github.com/InternLM/lmdeploy/pull/1940
* feat: add gpu topo for check_env by zhyncs in https://github.com/InternLM/lmdeploy/pull/1944
* fix transformers version check for InternVL2 by zhyncs in https://github.com/InternLM/lmdeploy/pull/1952
* Upgrade gradio by AllentDan in https://github.com/InternLM/lmdeploy/pull/1930
* refactor sampling layer setup by irexyc in https://github.com/InternLM/lmdeploy/pull/1912
* Add exception handler to imge encoder by irexyc in https://github.com/InternLM/lmdeploy/pull/2010
* Avoid the same session id for openai endpoint by AllentDan in https://github.com/InternLM/lmdeploy/pull/1995
🐞 Bug fixes
* Fix error link reference by zihaomu in https://github.com/InternLM/lmdeploy/pull/1881
* Fix internlm-xcomposer2-vl awq search scale by AllentDan in https://github.com/InternLM/lmdeploy/pull/1890
* fix SamplingDecodeTest and SamplingDecodeTest2 unittest failure by zhyncs in https://github.com/InternLM/lmdeploy/pull/1874
* Fix smem size for fused split-kv reduction by lzhangzz in https://github.com/InternLM/lmdeploy/pull/1909
* fix llama3 chat template by AllentDan in https://github.com/InternLM/lmdeploy/pull/1956
* fix: set PYTHONIOENCODING to UTF-8 before start tritonserver by zhyncs in https://github.com/InternLM/lmdeploy/pull/1971
* Fix internvl2-40b model export by irexyc in https://github.com/InternLM/lmdeploy/pull/1979
* fix logprobs by irexyc in https://github.com/InternLM/lmdeploy/pull/1968
* fix unexpected argument error when deploying "cogvlm-chat-hf" by AllentDan in https://github.com/InternLM/lmdeploy/pull/1982
* fix mixtral and mistral cache_position by zhyncs in https://github.com/InternLM/lmdeploy/pull/1941
* Fix the session_len assignment logic by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2007
* Fix logprobs openai api by irexyc in https://github.com/InternLM/lmdeploy/pull/1985
* Fix internvl2-40b awq inference by AllentDan in https://github.com/InternLM/lmdeploy/pull/2023
* Fix side effect of 1995 by AllentDan in https://github.com/InternLM/lmdeploy/pull/2033
📚 Documentations
* docs: update faq for turbomind so not found by zhyncs in https://github.com/InternLM/lmdeploy/pull/1877
* [Doc]: Change to sphinx-book-theme in readthedocs by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1880
* docs: update compatibility section in README by zhyncs in https://github.com/InternLM/lmdeploy/pull/1946
* docs: update kv quant doc by zhyncs in https://github.com/InternLM/lmdeploy/pull/1977
* docs: sync the core features in README to index.rst by zhyncs in https://github.com/InternLM/lmdeploy/pull/1988
* Fix table rendering for readthedocs by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1998
* docs: fix Ada compatibility by zhyncs in https://github.com/InternLM/lmdeploy/pull/2016
* update xcomposer2d5 docs by irexyc in https://github.com/InternLM/lmdeploy/pull/2037
🌐 Other
* [ci] add internlm2.5 models into testcase by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1928
* bump version to v0.5.1 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2022

New Contributors
* zihaomu made their first contribution in https://github.com/InternLM/lmdeploy/pull/1881
* fbzhong made their first contribution in https://github.com/InternLM/lmdeploy/pull/1876

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.5.0...v0.5.1

0.5.0

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
🚀 Features
* support MiniCPM-Llama3-V 2.5 by irexyc in https://github.com/InternLM/lmdeploy/pull/1708
* [Feature]: Support llava for pytorch engine by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1641
* Device dispatcher by grimoire in https://github.com/InternLM/lmdeploy/pull/1775
* Add GLM-4-9B-Chat by lzhangzz in https://github.com/InternLM/lmdeploy/pull/1724
* Torch deepseek v2 by grimoire in https://github.com/InternLM/lmdeploy/pull/1621
* Support internvl-chat for pytorch engine by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1797
* Add interfaces to the pipeline to obtain logits and ppl by irexyc in https://github.com/InternLM/lmdeploy/pull/1652
* [Feature]: Support cogvlm-chat by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1502
💥 Improvements
* support mistral and llava_mistral in turbomind by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1579
* Add health endpoint by AllentDan in https://github.com/InternLM/lmdeploy/pull/1679
* upgrade the version of the dependency package peft by grimoire in https://github.com/InternLM/lmdeploy/pull/1687
* Follow the conventional model_name by AllentDan in https://github.com/InternLM/lmdeploy/pull/1677
* API Image URL fetch timeout by vody-am in https://github.com/InternLM/lmdeploy/pull/1684
* Support internlm-xcomposer2-4khd-7b awq by AllentDan in https://github.com/InternLM/lmdeploy/pull/1666
* update dockerfile and docs by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1715
* lazy import VLAsyncEngine to avoid bringing in VLMs dependencies when deploying LLMs by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1714
* feat: align with OpenAI temperature range by zhyncs in https://github.com/InternLM/lmdeploy/pull/1733
* feat: align with OpenAI temperature range in api server by zhyncs in https://github.com/InternLM/lmdeploy/pull/1734
* Refactor converter about get_input_model_registered_name and get_output_model_registered_name_and_config by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1702
* Refine max_new_tokens logic to improve user experience by AllentDan in https://github.com/InternLM/lmdeploy/pull/1705
* Refactor loading weights by grimoire in https://github.com/InternLM/lmdeploy/pull/1603
* refactor config by grimoire in https://github.com/InternLM/lmdeploy/pull/1751
* Add anomaly handler by lzhangzz in https://github.com/InternLM/lmdeploy/pull/1780
* Encode raw image file to base64 by irexyc in https://github.com/InternLM/lmdeploy/pull/1773
* skip inference for oversized inputs by grimoire in https://github.com/InternLM/lmdeploy/pull/1769
* fix: prevent numpy breakage by zhyncs in https://github.com/InternLM/lmdeploy/pull/1791
* More accurate time logging for ImageEncoder and fix concurrent image processing corruption by irexyc in https://github.com/InternLM/lmdeploy/pull/1765
* Optimize kernel launch for triton2.2.0 and triton2.3.0 by grimoire in https://github.com/InternLM/lmdeploy/pull/1499
* feat: auto set awq model_format from hf by zhyncs in https://github.com/InternLM/lmdeploy/pull/1799
* check driver mismatch by grimoire in https://github.com/InternLM/lmdeploy/pull/1811
* PyTorchEngine adapts to the latest internlm2 modeling. by grimoire in https://github.com/InternLM/lmdeploy/pull/1798
* AsyncEngine create cancel task in exception. by grimoire in https://github.com/InternLM/lmdeploy/pull/1807
* compat internlm2 for pytorch engine by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1825
* Add model revision & download_dir to cli by irexyc in https://github.com/InternLM/lmdeploy/pull/1814
* fix image encoder request queue by irexyc in https://github.com/InternLM/lmdeploy/pull/1837
* Harden stream callback by lzhangzz in https://github.com/InternLM/lmdeploy/pull/1838
* Support Qwen2-1.5b awq by AllentDan in https://github.com/InternLM/lmdeploy/pull/1793
* remove chat template config in turbomind engine by irexyc in https://github.com/InternLM/lmdeploy/pull/1161
* misc: align PyTorch Engine temprature with TurboMind by zhyncs in https://github.com/InternLM/lmdeploy/pull/1850
* docs: update cache-max-entry-count help message by zhyncs in https://github.com/InternLM/lmdeploy/pull/1892
🐞 Bug fixes
* fix typos by irexyc in https://github.com/InternLM/lmdeploy/pull/1690
* [Bugfix] fix internvl-1.5-chat vision model preprocess and freeze weights by DefTruth in https://github.com/InternLM/lmdeploy/pull/1741
* lock setuptools version in dockerfile by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1770
* Fix openai package can not use proxy stream mode by AllentDan in https://github.com/InternLM/lmdeploy/pull/1692
* Fix finish_reason by AllentDan in https://github.com/InternLM/lmdeploy/pull/1768
* fix uncached stop words by grimoire in https://github.com/InternLM/lmdeploy/pull/1754
* [side-effect]Fix param `--cache-max-entry-count` is not taking effect (1758) by QwertyJack in https://github.com/InternLM/lmdeploy/pull/1778
* support qwen2 1.5b by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1782
* fix falcon attention by grimoire in https://github.com/InternLM/lmdeploy/pull/1761
* Refine AsyncEngine exception handler by AllentDan in https://github.com/InternLM/lmdeploy/pull/1789
* [side-effect] fix weight_type caused by PR 1702 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1795
* fix best_match_model by irexyc in https://github.com/InternLM/lmdeploy/pull/1812
* Fix Request completed log by irexyc in https://github.com/InternLM/lmdeploy/pull/1821
* fix qwen-vl-chat hung by irexyc in https://github.com/InternLM/lmdeploy/pull/1824
* Detokenize with prompt token ids by AllentDan in https://github.com/InternLM/lmdeploy/pull/1753
* Update engine.py to fix small typos by WANGSSSSSSS in https://github.com/InternLM/lmdeploy/pull/1829
* [side-effect] bring back "--cap" argument in chat cli by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1859
* Fix vl session-len by AllentDan in https://github.com/InternLM/lmdeploy/pull/1860
* fix gradio vl "stop_words" by irexyc in https://github.com/InternLM/lmdeploy/pull/1873
* fix qwen2 cache_position for PyTorch Engine when transformers>4.41.2 by zhyncs in https://github.com/InternLM/lmdeploy/pull/1886
* fix model name matching for internvl by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1867
📚 Documentations
* docs: add BentoLMDeploy in README by zhyncs in https://github.com/InternLM/lmdeploy/pull/1736
* [Doc]: Update docs for internlm2.5 by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1887
🌐 Other
* add longtext generation benchmark by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1694
* add qwen2 model into testcase by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1772
* fix pr test for newest internlm2 model by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1806
* react test evaluation config by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1861
* bump version to v0.5.0 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1852

New Contributors
* DefTruth made their first contribution in https://github.com/InternLM/lmdeploy/pull/1741
* QwertyJack made their first contribution in https://github.com/InternLM/lmdeploy/pull/1778
* WANGSSSSSSS made their first contribution in https://github.com/InternLM/lmdeploy/pull/1829

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.4.2...v0.5.0

Page 3 of 8

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.