Lmdeploy

Latest version: v0.7.0.post3

Safety actively analyzes 707299 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 8 of 8

0.0.4

<!-- Release notes generated using configuration in .github/release.yml at main -->

Highlight
* Support 4-bit LLM quantization and inference. Check [this](https://github.com/InternLM/lmdeploy/blob/main/docs/en/w4a16.md) guide for detailed information.
![image](https://github.com/InternLM/lmdeploy/assets/4560679/b38fc352-471e-4c06-9e31-5e251a6216f6)

What's Changed
🚀 Features
* Blazing fast W4A16 inference by lzhangzz in https://github.com/InternLM/lmdeploy/pull/202
* Support AWQ by pppppM in https://github.com/InternLM/lmdeploy/pull/108 and AllentDan in https://github.com/InternLM/lmdeploy/pull/228


💥 Improvements
* Add release note template by lvhan028 in https://github.com/InternLM/lmdeploy/pull/211
* feat(quantization): kv cache use asymmetric by tpoisonooo in https://github.com/InternLM/lmdeploy/pull/218

🐞 Bug fixes
* Fix TIS client got-no-space-result side effect brought by PR 197 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/222

📚 Documentations
* Update W4A16 News by pppppM in https://github.com/InternLM/lmdeploy/pull/227
* Check-in user guide for w4a16 LLM deployment by lvhan028 in https://github.com/InternLM/lmdeploy/pull/224

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.0.3...v0.0.4

0.0.3

What's Changed

🚀 Features
* Support tensor parallelism without offline splitting model weights by grimoire in https://github.com/InternLM/lmdeploy/pull/158
* Add script to split HuggingFace model to the smallest sharded checkpoints by LZHgrla in https://github.com/InternLM/lmdeploy/pull/199
* Add non-stream inference api for chatbot by lvhan028 in https://github.com/InternLM/lmdeploy/pull/200
*
💥 Improvements
* Add issue/pr templates by lvhan028 in https://github.com/InternLM/lmdeploy/pull/184
* Remove unused code to reduce binary size by lzhangzz in https://github.com/InternLM/lmdeploy/pull/181
* Support serving with gradio without communicating to TIS by AllentDan in https://github.com/InternLM/lmdeploy/pull/162
* Improve postprocessing in TIS serving by applying Incremental de-tokenizing by lvhan028 in https://github.com/InternLM/lmdeploy/pull/197
* Support multi-session chat by wangruohui in https://github.com/InternLM/lmdeploy/pull/178

🐞 Bug fixes
* Fix build test error and move turbmind csrc test cases to `tests/csrc` by lvhan028 in https://github.com/InternLM/lmdeploy/pull/188
* Fix launching client error by moving lmdeploy/turbomind/utils.py to lmdeploy/utils.py by lvhan028 in https://github.com/InternLM/lmdeploy/pull/191

📚 Documentations
* Update README.md by tpoisonooo in https://github.com/InternLM/lmdeploy/pull/187
* Translate turbomind.md by xin-li-67 in https://github.com/InternLM/lmdeploy/pull/173

New Contributors
* LZHgrla made their first contribution in https://github.com/InternLM/lmdeploy/pull/199

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.0.2...v0.0.3

0.0.2

What's Changed

🚀 Features

* Add lmdeploy python package built scripts and CI workflow by irexyc in <https://github.com/InternLM/lmdeploy/pull/163>, <https://github.com/InternLM/lmdeploy/pull/164>, <https://github.com/InternLM/lmdeploy/pull/170>
* Support LLama-2 with GQA by lzhangzz in <https://github.com/InternLM/lmdeploy/pull/147> and grimoire in <https://github.com/InternLM/lmdeploy/pull/160>
* Add Llama-2 chat template by grimoire in <https://github.com/InternLM/lmdeploy/pull/140>
* Add decode-only forward pass by lzhangzz in <https://github.com/InternLM/lmdeploy/pull/153>
* Support tensor parallelism in turbomind's python API by grimoire <https://github.com/InternLM/lmdeploy/pull/82>
* Support w pack qkv by tpoisonooo in <https://github.com/InternLM/lmdeploy/pull/83>

💥 Improvements

* Refactor the chat template of supported models using factory pattern by lvhan028 in <https://github.com/InternLM/lmdeploy/pull/144> and streamsunshine in <https://github.com/InternLM/lmdeploy/pull/174>
* Add profile throughput benchmark by grimoire in <https://github.com/InternLM/lmdeploy/pull/146>
* Remove slicing reponse and add resume api by streamsunshine in <https://github.com/InternLM/lmdeploy/pull/154>
* Support DeepSpeed on autoTP and kernel injection by KevinNuNu and wangruohui in <https://github.com/InternLM/lmdeploy/pull/138>
* Add github action for publishing docker image by RunningLeon in <https://github.com/InternLM/lmdeploy/pull/148>
*

🐞 Bug fixes

* Fix getting package root path error in python3.9 by lvhan028 in <https://github.com/InternLM/lmdeploy/pull/157>
* Return carriage caused overwriting at the same line by wangruohui in <https://github.com/InternLM/lmdeploy/pull/143>
* Fix the offset during streaming chat by lvhan028 in <https://github.com/InternLM/lmdeploy/pull/142>
* Fix concatenate bug in benchmark serving script by rollroll90 in <https://github.com/InternLM/lmdeploy/pull/134>
* Fix attempted_relative_import by KevinNuNu in <https://github.com/InternLM/lmdeploy/pull/125>

📚 Documentations

* Translate `en/quantization.md` into Chinese by xin-li-67 in <https://github.com/InternLM/lmdeploy/pull/166>
* Check-in benchmark on real conversation data by lvhan028 in <https://github.com/InternLM/lmdeploy/pull/156>
* Fix typo and missing dependant packages in REAME and requirements.txt by vansin in <https://github.com/InternLM/lmdeploy/pull/123>, APX103 in <https://github.com/InternLM/lmdeploy/pull/109>, AllentDan in <https://github.com/InternLM/lmdeploy/pull/119> and del-zhenwu in <https://github.com/InternLM/lmdeploy/pull/124>
* Add turbomind's architecture documentation by lzhangzz in <https://github.com/InternLM/lmdeploy/pull/101>


New Contributors
streamsunshine del-zhenwu APX103 xin-li-67 KevinNuNu rollroll90

Page 8 of 8

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.