<!-- Release notes generated using configuration in .github/release.yml at main -->
What's Changed
🚀 Features
* support release pipeline by irexyc in https://github.com/InternLM/lmdeploy/pull/3069
* [feature] add dlinfer w8a8 support. by Reinerzhou in https://github.com/InternLM/lmdeploy/pull/2988
* [maca] support deepseekv2 for maca backend. by Reinerzhou in https://github.com/InternLM/lmdeploy/pull/2918
* [Feature] support deepseek-vl2 for pytorch engine by CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3149
💥 Improvements
* use weights iterator while loading by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2886
* Add deepseek-r1 chat template by AllentDan in https://github.com/InternLM/lmdeploy/pull/3072
* Update tokenizer by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3061
* Set max concurrent requests by AllentDan in https://github.com/InternLM/lmdeploy/pull/2961
* remove logitswarper by grimoire in https://github.com/InternLM/lmdeploy/pull/3109
* Update benchmark script and user guide by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3110
* support eos_token list in turbomind by irexyc in https://github.com/InternLM/lmdeploy/pull/3044
* Use aiohttp inside proxy server && add --disable-cache-status argument by AllentDan in https://github.com/InternLM/lmdeploy/pull/3020
* Update runtime package dependencies by zgjja in https://github.com/InternLM/lmdeploy/pull/3142
* Make turbomind support embedding inputs on GPU by chengyuma in https://github.com/InternLM/lmdeploy/pull/3177
🐞 Bug fixes
* [dlinfer] fix ascend qwen2_vl graph_mode by yao-fengchen in https://github.com/InternLM/lmdeploy/pull/3045
* fix error in interactive api by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3074
* fix sliding window mgr by grimoire in https://github.com/InternLM/lmdeploy/pull/3068
* More arguments in api_client, update docstrings by AllentDan in https://github.com/InternLM/lmdeploy/pull/3077
* Add system role to deepseek chat template by AllentDan in https://github.com/InternLM/lmdeploy/pull/3031
* Fix xcomposer2d5 by irexyc in https://github.com/InternLM/lmdeploy/pull/3087
* fix user guide about cogvlm deployment by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3088
* fix postional argument by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3086
* Fix UT of deepseek chat template by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3125
* Fix internvl2.5 error after eviction by grimoire in https://github.com/InternLM/lmdeploy/pull/3122
* Fix cogvlm and phi3vision by RunningLeon in https://github.com/InternLM/lmdeploy/pull/3137
* [fix] fix vl gradio, use pipeline api and remove interactive chat by irexyc in https://github.com/InternLM/lmdeploy/pull/3136
* fix the issue that stop_token may be less than defined in model.py by irexyc in https://github.com/InternLM/lmdeploy/pull/3148
* fix typing by lz1998 in https://github.com/InternLM/lmdeploy/pull/3153
* fix min length penalty by irexyc in https://github.com/InternLM/lmdeploy/pull/3150
* fix default temperature value by irexyc in https://github.com/InternLM/lmdeploy/pull/3166
* Use pad_token_id as image_token_id for vl models by RunningLeon in https://github.com/InternLM/lmdeploy/pull/3158
* Fix tool call prompt for InternLM and Qwen by AllentDan in https://github.com/InternLM/lmdeploy/pull/3156
* Update qwen2.py by GxjGit in https://github.com/InternLM/lmdeploy/pull/3174
* fix temperature=0 by grimoire in https://github.com/InternLM/lmdeploy/pull/3176
* fix blocked fp8 moe by grimoire in https://github.com/InternLM/lmdeploy/pull/3181
* fix deepseekv2 has no attribute use_mla error by CUHKSZzxy in https://github.com/InternLM/lmdeploy/pull/3188
* fix unstoppable chat by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3189
🌐 Other
* [ci] add internlm3 into testcase by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/3038
* add internlm3 to supported models by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3041
* update pre-commit config by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2683
* [maca] add cudagraph support on maca backend. by Reinerzhou in https://github.com/InternLM/lmdeploy/pull/2834
* bump version to v0.7.0.post1 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3076
* bump version to v0.7.0.post2 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3094
* [Fix] fix the URL judgment problem in Windows by Lychee-acaca in https://github.com/InternLM/lmdeploy/pull/3103
* bump version to v0.7.0.post3 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3115
* [ci] fix some fail in daily testcase by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/3134
* Bump version to v0.7.1 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/3178
New Contributors
* Lychee-acaca made their first contribution in https://github.com/InternLM/lmdeploy/pull/3103
* lz1998 made their first contribution in https://github.com/InternLM/lmdeploy/pull/3153
* GxjGit made their first contribution in https://github.com/InternLM/lmdeploy/pull/3174
* chengyuma made their first contribution in https://github.com/InternLM/lmdeploy/pull/3177
* CUHKSZzxy made their first contribution in https://github.com/InternLM/lmdeploy/pull/3149
**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.7.0...v0.7.1