What's Changed
🚀 Features
* Add lmdeploy python package built scripts and CI workflow by irexyc in <https://github.com/InternLM/lmdeploy/pull/163>, <https://github.com/InternLM/lmdeploy/pull/164>, <https://github.com/InternLM/lmdeploy/pull/170>
* Support LLama-2 with GQA by lzhangzz in <https://github.com/InternLM/lmdeploy/pull/147> and grimoire in <https://github.com/InternLM/lmdeploy/pull/160>
* Add Llama-2 chat template by grimoire in <https://github.com/InternLM/lmdeploy/pull/140>
* Add decode-only forward pass by lzhangzz in <https://github.com/InternLM/lmdeploy/pull/153>
* Support tensor parallelism in turbomind's python API by grimoire <https://github.com/InternLM/lmdeploy/pull/82>
* Support w pack qkv by tpoisonooo in <https://github.com/InternLM/lmdeploy/pull/83>
💥 Improvements
* Refactor the chat template of supported models using factory pattern by lvhan028 in <https://github.com/InternLM/lmdeploy/pull/144> and streamsunshine in <https://github.com/InternLM/lmdeploy/pull/174>
* Add profile throughput benchmark by grimoire in <https://github.com/InternLM/lmdeploy/pull/146>
* Remove slicing reponse and add resume api by streamsunshine in <https://github.com/InternLM/lmdeploy/pull/154>
* Support DeepSpeed on autoTP and kernel injection by KevinNuNu and wangruohui in <https://github.com/InternLM/lmdeploy/pull/138>
* Add github action for publishing docker image by RunningLeon in <https://github.com/InternLM/lmdeploy/pull/148>
*
🐞 Bug fixes
* Fix getting package root path error in python3.9 by lvhan028 in <https://github.com/InternLM/lmdeploy/pull/157>
* Return carriage caused overwriting at the same line by wangruohui in <https://github.com/InternLM/lmdeploy/pull/143>
* Fix the offset during streaming chat by lvhan028 in <https://github.com/InternLM/lmdeploy/pull/142>
* Fix concatenate bug in benchmark serving script by rollroll90 in <https://github.com/InternLM/lmdeploy/pull/134>
* Fix attempted_relative_import by KevinNuNu in <https://github.com/InternLM/lmdeploy/pull/125>
📚 Documentations
* Translate `en/quantization.md` into Chinese by xin-li-67 in <https://github.com/InternLM/lmdeploy/pull/166>
* Check-in benchmark on real conversation data by lvhan028 in <https://github.com/InternLM/lmdeploy/pull/156>
* Fix typo and missing dependant packages in REAME and requirements.txt by vansin in <https://github.com/InternLM/lmdeploy/pull/123>, APX103 in <https://github.com/InternLM/lmdeploy/pull/109>, AllentDan in <https://github.com/InternLM/lmdeploy/pull/119> and del-zhenwu in <https://github.com/InternLM/lmdeploy/pull/124>
* Add turbomind's architecture documentation by lzhangzz in <https://github.com/InternLM/lmdeploy/pull/101>
New Contributors
streamsunshine del-zhenwu APX103 xin-li-67 KevinNuNu rollroll90