Lmdeploy

Latest version: v0.7.0.post3

Safety actively analyzes 706267 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 8

0.5.3

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
🚀 Features
* PyTorch Engine AWQ support by grimoire in https://github.com/InternLM/lmdeploy/pull/1913
* Phi3 awq by grimoire in https://github.com/InternLM/lmdeploy/pull/1984
* Fix chunked prefill by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2201
* support VLMs with Qwen as the language model by irexyc in https://github.com/InternLM/lmdeploy/pull/2207
💥 Improvements
* Support specifying a prefix of assistant response by AllentDan in https://github.com/InternLM/lmdeploy/pull/2172
* Strict check for `name_map` in `InternLM2Chat7B` by SamuraiBUPT in https://github.com/InternLM/lmdeploy/pull/2156
* Check errors for attention kernels by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2206
* update base image to support cuda12.4 in dockerfile by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2182
* Stop synchronizing for `length_criterion` by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2202
* adapt MiniCPM-Llama3-V-2_5 new code by irexyc in https://github.com/InternLM/lmdeploy/pull/2139
* Remove duplicate code by cmpute in https://github.com/InternLM/lmdeploy/pull/2133
🐞 Bug fixes
* [Hotfix] miss parentheses when calcuating the coef of llama3 rope by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2157
* support logit softcap by grimoire in https://github.com/InternLM/lmdeploy/pull/2158
* Fix gmem to smem WAW conflict in awq gemm kernel by foreverrookie in https://github.com/InternLM/lmdeploy/pull/2111
* Fix gradio serve using a wrong chat template by AllentDan in https://github.com/InternLM/lmdeploy/pull/2131
* fix runtime error when using dynamic scale rotary embed for InternLM2… by CyCle1024 in https://github.com/InternLM/lmdeploy/pull/2212
* Add peer-access-enabled allocator by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2218
* Fix typos in profile_generation.py by jiajie-yang in https://github.com/InternLM/lmdeploy/pull/2233
📚 Documentations
* docs: fix Qwen typo by ArtificialZeng in https://github.com/InternLM/lmdeploy/pull/2136
* wrong expression by ArtificialZeng in https://github.com/InternLM/lmdeploy/pull/2165
* clearify the model type LLM or MLLM in supported model matrix by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2209
* docs: add Japanese README by eltociear in https://github.com/InternLM/lmdeploy/pull/2237
🌐 Other
* bump version to 0.5.2.post1 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2159
* update news about cooperation with modelscope/swift by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2200
* bump version to v0.5.3 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2242

New Contributors
* ArtificialZeng made their first contribution in https://github.com/InternLM/lmdeploy/pull/2136
* foreverrookie made their first contribution in https://github.com/InternLM/lmdeploy/pull/2111
* SamuraiBUPT made their first contribution in https://github.com/InternLM/lmdeploy/pull/2156
* CyCle1024 made their first contribution in https://github.com/InternLM/lmdeploy/pull/2212
* jiajie-yang made their first contribution in https://github.com/InternLM/lmdeploy/pull/2233
* cmpute made their first contribution in https://github.com/InternLM/lmdeploy/pull/2133

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.5.2...v0.5.3

0.5.2.post1

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
🐞 Bug fixes
* [Hotfix] miss parentheses when calcuating the coef of llama3 rope which causes needle-in-hays experiment failed by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2157
🌐 Other
* bump version to 0.5.2.post1 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2159


**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.5.2...v0.5.2.post1

0.5.2

<!-- Release notes generated using configuration in .github/release.yml at main -->

Highlight

- LMDeploy support Llama3.1 and its **Tool Calling**. An example of calling "Wolfram Alpha" to perform complex mathematical calculations can be found from [here](https://github.com/InternLM/lmdeploy/blob/main/docs/en/serving/api_server_tools.md)

What's Changed
🚀 Features
* Support glm4 awq by AllentDan in https://github.com/InternLM/lmdeploy/pull/1993
* Support llama3.1 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2122
* Support Llama3.1 tool calling by AllentDan in https://github.com/InternLM/lmdeploy/pull/2123
💥 Improvements
* Remove the triton inference server backend "turbomind_backend" by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1986
* Remove kv cache offline quantization by AllentDan in https://github.com/InternLM/lmdeploy/pull/2097
* Remove `session_len` and deprecated short names of the chat templates by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2105
* clarify "n>1" in GenerationConfig hasn't been supported yet by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2108
🐞 Bug fixes
* fix stop words for glm4 by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2044
* Disable peer access code by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2082
* set log level ERROR in benchmark scripts by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2086
* raise thread exception by irexyc in https://github.com/InternLM/lmdeploy/pull/2071
* Fix index error when profiling token generation with `-ct 1` by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1898
🌐 Other
* misc: replace slow Jimver/cuda-toolkit by zhyncs in https://github.com/InternLM/lmdeploy/pull/2065
* misc: update bug issue template by zhyncs in https://github.com/InternLM/lmdeploy/pull/2083
* update daily testcase new by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2035
* bump version to v0.5.2 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2143


**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.5.1...v0.5.2

0.5.1

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
🚀 Features
* Support phi3-vision by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1845
* Support internvl2 chat template by AllentDan in https://github.com/InternLM/lmdeploy/pull/1911
* support gemma2 in pytorch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/1924
* Add tools to api_server for InternLM2 model by AllentDan in https://github.com/InternLM/lmdeploy/pull/1763
* support internvl2-1b by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1983
* feat: support llama2 and internlm2 on 910B by yao-fengchen in https://github.com/InternLM/lmdeploy/pull/2011
* Support glm 4v by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1947
* support internlm-xcomposer2d5-7b by irexyc in https://github.com/InternLM/lmdeploy/pull/1932
* add chat template for codegeex4 by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2013
💥 Improvements
* misc: rm unnecessary files by zhyncs in https://github.com/InternLM/lmdeploy/pull/1875
* drop stop words by grimoire in https://github.com/InternLM/lmdeploy/pull/1823
* Add usage in stream response by fbzhong in https://github.com/InternLM/lmdeploy/pull/1876
* Optimize sampling on pytorch engine. by grimoire in https://github.com/InternLM/lmdeploy/pull/1853
* Remove deprecated chat cli and vl examples by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1899
* vision model use tp number of gpu by irexyc in https://github.com/InternLM/lmdeploy/pull/1854
* misc: add default api_server_url for api_client by zhyncs in https://github.com/InternLM/lmdeploy/pull/1922
* misc: add transformers version check for TurboMind Tokenizer by zhyncs in https://github.com/InternLM/lmdeploy/pull/1917
* fix: append _stats when size > 0 by zhyncs in https://github.com/InternLM/lmdeploy/pull/1809
* refactor: update awq linear and rm legacy by zhyncs in https://github.com/InternLM/lmdeploy/pull/1940
* feat: add gpu topo for check_env by zhyncs in https://github.com/InternLM/lmdeploy/pull/1944
* fix transformers version check for InternVL2 by zhyncs in https://github.com/InternLM/lmdeploy/pull/1952
* Upgrade gradio by AllentDan in https://github.com/InternLM/lmdeploy/pull/1930
* refactor sampling layer setup by irexyc in https://github.com/InternLM/lmdeploy/pull/1912
* Add exception handler to imge encoder by irexyc in https://github.com/InternLM/lmdeploy/pull/2010
* Avoid the same session id for openai endpoint by AllentDan in https://github.com/InternLM/lmdeploy/pull/1995
🐞 Bug fixes
* Fix error link reference by zihaomu in https://github.com/InternLM/lmdeploy/pull/1881
* Fix internlm-xcomposer2-vl awq search scale by AllentDan in https://github.com/InternLM/lmdeploy/pull/1890
* fix SamplingDecodeTest and SamplingDecodeTest2 unittest failure by zhyncs in https://github.com/InternLM/lmdeploy/pull/1874
* Fix smem size for fused split-kv reduction by lzhangzz in https://github.com/InternLM/lmdeploy/pull/1909
* fix llama3 chat template by AllentDan in https://github.com/InternLM/lmdeploy/pull/1956
* fix: set PYTHONIOENCODING to UTF-8 before start tritonserver by zhyncs in https://github.com/InternLM/lmdeploy/pull/1971
* Fix internvl2-40b model export by irexyc in https://github.com/InternLM/lmdeploy/pull/1979
* fix logprobs by irexyc in https://github.com/InternLM/lmdeploy/pull/1968
* fix unexpected argument error when deploying "cogvlm-chat-hf" by AllentDan in https://github.com/InternLM/lmdeploy/pull/1982
* fix mixtral and mistral cache_position by zhyncs in https://github.com/InternLM/lmdeploy/pull/1941
* Fix the session_len assignment logic by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2007
* Fix logprobs openai api by irexyc in https://github.com/InternLM/lmdeploy/pull/1985
* Fix internvl2-40b awq inference by AllentDan in https://github.com/InternLM/lmdeploy/pull/2023
* Fix side effect of 1995 by AllentDan in https://github.com/InternLM/lmdeploy/pull/2033
📚 Documentations
* docs: update faq for turbomind so not found by zhyncs in https://github.com/InternLM/lmdeploy/pull/1877
* [Doc]: Change to sphinx-book-theme in readthedocs by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1880
* docs: update compatibility section in README by zhyncs in https://github.com/InternLM/lmdeploy/pull/1946
* docs: update kv quant doc by zhyncs in https://github.com/InternLM/lmdeploy/pull/1977
* docs: sync the core features in README to index.rst by zhyncs in https://github.com/InternLM/lmdeploy/pull/1988
* Fix table rendering for readthedocs by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1998
* docs: fix Ada compatibility by zhyncs in https://github.com/InternLM/lmdeploy/pull/2016
* update xcomposer2d5 docs by irexyc in https://github.com/InternLM/lmdeploy/pull/2037
🌐 Other
* [ci] add internlm2.5 models into testcase by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1928
* bump version to v0.5.1 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2022

New Contributors
* zihaomu made their first contribution in https://github.com/InternLM/lmdeploy/pull/1881
* fbzhong made their first contribution in https://github.com/InternLM/lmdeploy/pull/1876

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.5.0...v0.5.1

0.5.0

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
🚀 Features
* support MiniCPM-Llama3-V 2.5 by irexyc in https://github.com/InternLM/lmdeploy/pull/1708
* [Feature]: Support llava for pytorch engine by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1641
* Device dispatcher by grimoire in https://github.com/InternLM/lmdeploy/pull/1775
* Add GLM-4-9B-Chat by lzhangzz in https://github.com/InternLM/lmdeploy/pull/1724
* Torch deepseek v2 by grimoire in https://github.com/InternLM/lmdeploy/pull/1621
* Support internvl-chat for pytorch engine by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1797
* Add interfaces to the pipeline to obtain logits and ppl by irexyc in https://github.com/InternLM/lmdeploy/pull/1652
* [Feature]: Support cogvlm-chat by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1502
💥 Improvements
* support mistral and llava_mistral in turbomind by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1579
* Add health endpoint by AllentDan in https://github.com/InternLM/lmdeploy/pull/1679
* upgrade the version of the dependency package peft by grimoire in https://github.com/InternLM/lmdeploy/pull/1687
* Follow the conventional model_name by AllentDan in https://github.com/InternLM/lmdeploy/pull/1677
* API Image URL fetch timeout by vody-am in https://github.com/InternLM/lmdeploy/pull/1684
* Support internlm-xcomposer2-4khd-7b awq by AllentDan in https://github.com/InternLM/lmdeploy/pull/1666
* update dockerfile and docs by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1715
* lazy import VLAsyncEngine to avoid bringing in VLMs dependencies when deploying LLMs by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1714
* feat: align with OpenAI temperature range by zhyncs in https://github.com/InternLM/lmdeploy/pull/1733
* feat: align with OpenAI temperature range in api server by zhyncs in https://github.com/InternLM/lmdeploy/pull/1734
* Refactor converter about get_input_model_registered_name and get_output_model_registered_name_and_config by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1702
* Refine max_new_tokens logic to improve user experience by AllentDan in https://github.com/InternLM/lmdeploy/pull/1705
* Refactor loading weights by grimoire in https://github.com/InternLM/lmdeploy/pull/1603
* refactor config by grimoire in https://github.com/InternLM/lmdeploy/pull/1751
* Add anomaly handler by lzhangzz in https://github.com/InternLM/lmdeploy/pull/1780
* Encode raw image file to base64 by irexyc in https://github.com/InternLM/lmdeploy/pull/1773
* skip inference for oversized inputs by grimoire in https://github.com/InternLM/lmdeploy/pull/1769
* fix: prevent numpy breakage by zhyncs in https://github.com/InternLM/lmdeploy/pull/1791
* More accurate time logging for ImageEncoder and fix concurrent image processing corruption by irexyc in https://github.com/InternLM/lmdeploy/pull/1765
* Optimize kernel launch for triton2.2.0 and triton2.3.0 by grimoire in https://github.com/InternLM/lmdeploy/pull/1499
* feat: auto set awq model_format from hf by zhyncs in https://github.com/InternLM/lmdeploy/pull/1799
* check driver mismatch by grimoire in https://github.com/InternLM/lmdeploy/pull/1811
* PyTorchEngine adapts to the latest internlm2 modeling. by grimoire in https://github.com/InternLM/lmdeploy/pull/1798
* AsyncEngine create cancel task in exception. by grimoire in https://github.com/InternLM/lmdeploy/pull/1807
* compat internlm2 for pytorch engine by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1825
* Add model revision & download_dir to cli by irexyc in https://github.com/InternLM/lmdeploy/pull/1814
* fix image encoder request queue by irexyc in https://github.com/InternLM/lmdeploy/pull/1837
* Harden stream callback by lzhangzz in https://github.com/InternLM/lmdeploy/pull/1838
* Support Qwen2-1.5b awq by AllentDan in https://github.com/InternLM/lmdeploy/pull/1793
* remove chat template config in turbomind engine by irexyc in https://github.com/InternLM/lmdeploy/pull/1161
* misc: align PyTorch Engine temprature with TurboMind by zhyncs in https://github.com/InternLM/lmdeploy/pull/1850
* docs: update cache-max-entry-count help message by zhyncs in https://github.com/InternLM/lmdeploy/pull/1892
🐞 Bug fixes
* fix typos by irexyc in https://github.com/InternLM/lmdeploy/pull/1690
* [Bugfix] fix internvl-1.5-chat vision model preprocess and freeze weights by DefTruth in https://github.com/InternLM/lmdeploy/pull/1741
* lock setuptools version in dockerfile by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1770
* Fix openai package can not use proxy stream mode by AllentDan in https://github.com/InternLM/lmdeploy/pull/1692
* Fix finish_reason by AllentDan in https://github.com/InternLM/lmdeploy/pull/1768
* fix uncached stop words by grimoire in https://github.com/InternLM/lmdeploy/pull/1754
* [side-effect]Fix param `--cache-max-entry-count` is not taking effect (1758) by QwertyJack in https://github.com/InternLM/lmdeploy/pull/1778
* support qwen2 1.5b by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1782
* fix falcon attention by grimoire in https://github.com/InternLM/lmdeploy/pull/1761
* Refine AsyncEngine exception handler by AllentDan in https://github.com/InternLM/lmdeploy/pull/1789
* [side-effect] fix weight_type caused by PR 1702 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1795
* fix best_match_model by irexyc in https://github.com/InternLM/lmdeploy/pull/1812
* Fix Request completed log by irexyc in https://github.com/InternLM/lmdeploy/pull/1821
* fix qwen-vl-chat hung by irexyc in https://github.com/InternLM/lmdeploy/pull/1824
* Detokenize with prompt token ids by AllentDan in https://github.com/InternLM/lmdeploy/pull/1753
* Update engine.py to fix small typos by WANGSSSSSSS in https://github.com/InternLM/lmdeploy/pull/1829
* [side-effect] bring back "--cap" argument in chat cli by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1859
* Fix vl session-len by AllentDan in https://github.com/InternLM/lmdeploy/pull/1860
* fix gradio vl "stop_words" by irexyc in https://github.com/InternLM/lmdeploy/pull/1873
* fix qwen2 cache_position for PyTorch Engine when transformers>4.41.2 by zhyncs in https://github.com/InternLM/lmdeploy/pull/1886
* fix model name matching for internvl by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1867
📚 Documentations
* docs: add BentoLMDeploy in README by zhyncs in https://github.com/InternLM/lmdeploy/pull/1736
* [Doc]: Update docs for internlm2.5 by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1887
🌐 Other
* add longtext generation benchmark by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1694
* add qwen2 model into testcase by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1772
* fix pr test for newest internlm2 model by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1806
* react test evaluation config by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1861
* bump version to v0.5.0 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1852

New Contributors
* DefTruth made their first contribution in https://github.com/InternLM/lmdeploy/pull/1741
* QwertyJack made their first contribution in https://github.com/InternLM/lmdeploy/pull/1778
* WANGSSSSSSS made their first contribution in https://github.com/InternLM/lmdeploy/pull/1829

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.4.2...v0.5.0

0.4.2

<!-- Release notes generated using configuration in .github/release.yml at main -->
Highlight

- Support 4-bit weight-only quantization and inference on VMLs, such as InternVL v1.5, LLaVa, InternLMXComposer2

**Quantization**

python
lmdeploy lite auto_awq OpenGVLab/InternVL-Chat-V1-5 --work-dir ./InternVL-Chat-V1-5-AWQ


**Inference with quantized model**

python
from lmdeploy import pipeline, TurbomindEngineConfig
from lmdeploy.vl import load_image

pipe = pipeline('./InternVL-Chat-V1-5-AWQ', backend_config=TurbomindEngineConfig(tp=1, model_format='awq'))

img = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
out = pipe(('describe this image', img))
print(out)


- Balance vision model when deploying VLMs with multiple GPUs

python
from lmdeploy import pipeline, TurbomindEngineConfig
from lmdeploy.vl import load_image

pipe = pipeline('OpenGVLab/InternVL-Chat-V1-5', backend_config=TurbomindEngineConfig(tp=2))

img = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
out = pipe(('describe this image', img))
print(out)


What's Changed
🚀 Features
* PyTorch Engine hash table based prefix caching by grimoire in https://github.com/InternLM/lmdeploy/pull/1429
* support phi3 by grimoire in https://github.com/InternLM/lmdeploy/pull/1497
* Turbomind prefix caching by ispobock in https://github.com/InternLM/lmdeploy/pull/1450
* Enable search scale for awq by AllentDan in https://github.com/InternLM/lmdeploy/pull/1545
* [Feature] Support vl models quantization by AllentDan in https://github.com/InternLM/lmdeploy/pull/1553
💥 Improvements
* make Qwen compatible with Slora when TP > 1 by jjjjohnson in https://github.com/InternLM/lmdeploy/pull/1518
* Optimize slora by grimoire in https://github.com/InternLM/lmdeploy/pull/1447
* Use a faster format for images in VLMs by isidentical in https://github.com/InternLM/lmdeploy/pull/1575
* add chat-template args to chat cli by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1566
* Get the max session len from config.json by AllentDan in https://github.com/InternLM/lmdeploy/pull/1550
* Optimize w8a8 kernel by grimoire in https://github.com/InternLM/lmdeploy/pull/1353
* support python 3.12 by irexyc in https://github.com/InternLM/lmdeploy/pull/1605
* Optimize moe by grimoire in https://github.com/InternLM/lmdeploy/pull/1520
* Balance vision model weights on multi gpus by irexyc in https://github.com/InternLM/lmdeploy/pull/1591
* Support user-specified IMAGE_TOKEN position for deepseek-vl model by irexyc in https://github.com/InternLM/lmdeploy/pull/1627
* Optimize GQA/MQA by grimoire in https://github.com/InternLM/lmdeploy/pull/1649
🐞 Bug fixes
* fix logger init by AllentDan in https://github.com/InternLM/lmdeploy/pull/1598
* Bugfix: wrongly assign gen_config with True by thelongestusernameofall in https://github.com/InternLM/lmdeploy/pull/1594
* Enable split-kv for attention by lzhangzz in https://github.com/InternLM/lmdeploy/pull/1606
* Fix xcomposer2 vision model process by irexyc in https://github.com/InternLM/lmdeploy/pull/1640
* Fix NTK scaling by lzhangzz in https://github.com/InternLM/lmdeploy/pull/1636
* Fix illegal memory access when seq_len < 64 by lzhangzz in https://github.com/InternLM/lmdeploy/pull/1616
* Fix llava vl template by irexyc in https://github.com/InternLM/lmdeploy/pull/1620
* [side-effect] fix deepseek-vl when tp is 1 by irexyc in https://github.com/InternLM/lmdeploy/pull/1648
* fix logprobs output by irexyc in https://github.com/InternLM/lmdeploy/pull/1561
* fix fused-moe in triton2.2.0 by grimoire in https://github.com/InternLM/lmdeploy/pull/1654
* Align tokenizers in pipeline and api_server benchmark scripts by AllentDan in https://github.com/InternLM/lmdeploy/pull/1650
* [side-effect] fix UnboundLocalError for internlm-xcomposer2-4khd-7b by irexyc in https://github.com/InternLM/lmdeploy/pull/1661
* remove paged attention prefill autotune by grimoire in https://github.com/InternLM/lmdeploy/pull/1658
* Fix transformers 4.41.0 prompt may differ after encode decode by AllentDan in https://github.com/InternLM/lmdeploy/pull/1617
📚 Documentations
* Fix typo in w8a8.md by chg0901 in https://github.com/InternLM/lmdeploy/pull/1568
* Update doc for prefix caching by ispobock in https://github.com/InternLM/lmdeploy/pull/1597
* Update VL document by AllentDan in https://github.com/InternLM/lmdeploy/pull/1657
🌐 Other
* remove first empty token check and add input validation testcase by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1549
* add more model into benchmark and evaluate workflow by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1565
* add vl awq testcase and refactor pipeline testcase by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1630
* bump version to v0.4.2 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1644

New Contributors
* isidentical made their first contribution in https://github.com/InternLM/lmdeploy/pull/1575
* chg0901 made their first contribution in https://github.com/InternLM/lmdeploy/pull/1568
* thelongestusernameofall made their first contribution in https://github.com/InternLM/lmdeploy/pull/1594

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.4.1...v0.4.2

Page 3 of 8

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.