Internlm

Latest version: v0.2.0

Safety actively analyzes 682471 Python packages for vulnerabilities to keep your Python projects secure.

Page 6 of 6

0.0.6

Highlights

* Support Qwen-7B with dynamic NTK scaling and logN scaling in turbomind
* Support tensor parallelism for W4A16
* Add OpenAI-like RESTful API
* Support Llama-2 70B 4-bit quantization

What's Changed

🚀 Features
* Profiling tool for huggingface and deepspeed models by wangruohui in https://github.com/InternLM/lmdeploy/pull/161
* Support windows platform by irexyc in https://github.com/InternLM/lmdeploy/pull/209
* Qwen-7B, dynamic NTK scaling and logN scaling support in turbomind by lzhangzz in https://github.com/InternLM/lmdeploy/pull/230
* Add Restful API by AllentDan in https://github.com/InternLM/lmdeploy/pull/223
* Support context decoding with DP in pytorch by wangruohui in https://github.com/InternLM/lmdeploy/pull/193

💥 Improvements
* Support TP for W4A16 by lzhangzz in https://github.com/InternLM/lmdeploy/pull/262
* Pass chat template args including meta_prompt to model(https://github.com/InternLM/lmdeploy/commit/7785142d7c13a21bc01c2e7c0bc10b82964371f1) by AllentDan in https://github.com/InternLM/lmdeploy/pull/225
* Enable the Gradio server to call inference services through the RESTful API by AllentDan in https://github.com/InternLM/lmdeploy/pull/287

🐞 Bug fixes
* Adjust dependency of gradio server by AllentDan in https://github.com/InternLM/lmdeploy/pull/236
* Implement `movmatrix` using warp shuffling for CUDA < 11.8 by lzhangzz in https://github.com/InternLM/lmdeploy/pull/267
* Add 'accelerate' to requirement list by lvhan028 in https://github.com/InternLM/lmdeploy/pull/261
* Fix building with CUDA 11.3 by lzhangzz in https://github.com/InternLM/lmdeploy/pull/280
* Pad tok_embedding and output weights to make their shape divisible by TP by lvhan028 in https://github.com/InternLM/lmdeploy/pull/285
* Fix llama2 70b & qwen quantization error by pppppM in https://github.com/InternLM/lmdeploy/pull/273
* Import turbomind in gradio server only when it is needed by AllentDan in https://github.com/InternLM/lmdeploy/pull/303

📚 Documentations
* Remove specified version in user guide by lvhan028 in https://github.com/InternLM/lmdeploy/pull/241
* docs(quantzation): update description by tpoisonooo in https://github.com/InternLM/lmdeploy/pull/253 and https://github.com/InternLM/lmdeploy/pull/272
* Check-in FAQ by lvhan028 in https://github.com/InternLM/lmdeploy/pull/256
* add readthedocs by RunningLeon in https://github.com/InternLM/lmdeploy/pull/208

🌐 Other
* Update workflow for building docker image by RunningLeon in https://github.com/InternLM/lmdeploy/pull/282
* Change to github-hosted runner for building docker image by RunningLeon in https://github.com/InternLM/lmdeploy/pull/291

Known issues
* 4-bit Qwen-7b model inference failed. 307 is addressing this issue.

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.0.5...v0.0.6

0.0.5

What's Changed

🐞 Bug fixes
* Fix wrong RPATH using the absolute path instead of relative one by irexyc in https://github.com/InternLM/lmdeploy/pull/239

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.0.4...v0.0.5

0.0.4

Highlight
* Support 4-bit LLM quantization and inference. Check [this](https://github.com/InternLM/lmdeploy/blob/main/docs/en/w4a16.md) guide for detailed information.
![image](https://github.com/InternLM/lmdeploy/assets/4560679/b38fc352-471e-4c06-9e31-5e251a6216f6)

What's Changed
🚀 Features
* Blazing fast W4A16 inference by lzhangzz in https://github.com/InternLM/lmdeploy/pull/202
* Support AWQ by pppppM in https://github.com/InternLM/lmdeploy/pull/108 and AllentDan in https://github.com/InternLM/lmdeploy/pull/228

💥 Improvements
* Add release note template by lvhan028 in https://github.com/InternLM/lmdeploy/pull/211
* feat(quantization): kv cache use asymmetric by tpoisonooo in https://github.com/InternLM/lmdeploy/pull/218

🐞 Bug fixes
* Fix TIS client got-no-space-result side effect brought by PR 197 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/222

📚 Documentations
* Update W4A16 News by pppppM in https://github.com/InternLM/lmdeploy/pull/227
* Check-in user guide for w4a16 LLM deployment by lvhan028 in https://github.com/InternLM/lmdeploy/pull/224

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.0.3...v0.0.4

0.0.3

What's Changed

🚀 Features
* Support tensor parallelism without offline splitting model weights by grimoire in https://github.com/InternLM/lmdeploy/pull/158
* Add script to split HuggingFace model to the smallest sharded checkpoints by LZHgrla in https://github.com/InternLM/lmdeploy/pull/199
* Add non-stream inference api for chatbot by lvhan028 in https://github.com/InternLM/lmdeploy/pull/200
*
💥 Improvements
* Add issue/pr templates by lvhan028 in https://github.com/InternLM/lmdeploy/pull/184
* Remove unused code to reduce binary size by lzhangzz in https://github.com/InternLM/lmdeploy/pull/181
* Support serving with gradio without communicating to TIS by AllentDan in https://github.com/InternLM/lmdeploy/pull/162
* Improve postprocessing in TIS serving by applying Incremental de-tokenizing by lvhan028 in https://github.com/InternLM/lmdeploy/pull/197
* Support multi-session chat by wangruohui in https://github.com/InternLM/lmdeploy/pull/178

🐞 Bug fixes
* Fix build test error and move turbmind csrc test cases to `tests/csrc` by lvhan028 in https://github.com/InternLM/lmdeploy/pull/188
* Fix launching client error by moving lmdeploy/turbomind/utils.py to lmdeploy/utils.py by lvhan028 in https://github.com/InternLM/lmdeploy/pull/191

📚 Documentations
* Update README.md by tpoisonooo in https://github.com/InternLM/lmdeploy/pull/187
* Translate turbomind.md by xin-li-67 in https://github.com/InternLM/lmdeploy/pull/173

New Contributors
* LZHgrla made their first contribution in https://github.com/InternLM/lmdeploy/pull/199

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.0.2...v0.0.3

0.0.2

What's Changed

🚀 Features

* Add lmdeploy python package built scripts and CI workflow by irexyc in <https://github.com/InternLM/lmdeploy/pull/163>, <https://github.com/InternLM/lmdeploy/pull/164>, <https://github.com/InternLM/lmdeploy/pull/170>
* Support LLama-2 with GQA by lzhangzz in <https://github.com/InternLM/lmdeploy/pull/147> and grimoire in <https://github.com/InternLM/lmdeploy/pull/160>
* Add Llama-2 chat template by grimoire in <https://github.com/InternLM/lmdeploy/pull/140>
* Add decode-only forward pass by lzhangzz in <https://github.com/InternLM/lmdeploy/pull/153>
* Support tensor parallelism in turbomind's python API by grimoire <https://github.com/InternLM/lmdeploy/pull/82>
* Support w pack qkv by tpoisonooo in <https://github.com/InternLM/lmdeploy/pull/83>

💥 Improvements

* Refactor the chat template of supported models using factory pattern by lvhan028 in <https://github.com/InternLM/lmdeploy/pull/144> and streamsunshine in <https://github.com/InternLM/lmdeploy/pull/174>
* Add profile throughput benchmark by grimoire in <https://github.com/InternLM/lmdeploy/pull/146>
* Remove slicing reponse and add resume api by streamsunshine in <https://github.com/InternLM/lmdeploy/pull/154>
* Support DeepSpeed on autoTP and kernel injection by KevinNuNu and wangruohui in <https://github.com/InternLM/lmdeploy/pull/138>
* Add github action for publishing docker image by RunningLeon in <https://github.com/InternLM/lmdeploy/pull/148>
*

🐞 Bug fixes

* Fix getting package root path error in python3.9 by lvhan028 in <https://github.com/InternLM/lmdeploy/pull/157>
* Return carriage caused overwriting at the same line by wangruohui in <https://github.com/InternLM/lmdeploy/pull/143>
* Fix the offset during streaming chat by lvhan028 in <https://github.com/InternLM/lmdeploy/pull/142>
* Fix concatenate bug in benchmark serving script by rollroll90 in <https://github.com/InternLM/lmdeploy/pull/134>
* Fix attempted_relative_import by KevinNuNu in <https://github.com/InternLM/lmdeploy/pull/125>

📚 Documentations

* Translate `en/quantization.md` into Chinese by xin-li-67 in <https://github.com/InternLM/lmdeploy/pull/166>
* Check-in benchmark on real conversation data by lvhan028 in <https://github.com/InternLM/lmdeploy/pull/156>
* Fix typo and missing dependant packages in REAME and requirements.txt by vansin in <https://github.com/InternLM/lmdeploy/pull/123>, APX103 in <https://github.com/InternLM/lmdeploy/pull/109>, AllentDan in <https://github.com/InternLM/lmdeploy/pull/119> and del-zhenwu in <https://github.com/InternLM/lmdeploy/pull/124>
* Add turbomind's architecture documentation by lzhangzz in <https://github.com/InternLM/lmdeploy/pull/101>

New Contributors
streamsunshine del-zhenwu APX103 xin-li-67 KevinNuNu rollroll90

Page 6 of 6

Releases

Has known vulnerabilities

Internlm

Page 6 of 6

0.0.6

0.0.5

0.0.4

0.0.3

0.0.2

Page 6 of 6

Links

Releases