<!-- Release notes generated using configuration in .github/release.yml at main -->
What's Changed
💥 Improvements
* Unify prefill & decode passes by lzhangzz in https://github.com/InternLM/lmdeploy/pull/775
* add cuda12.1 build check ci by irexyc in https://github.com/InternLM/lmdeploy/pull/782
* auto upload cuda12.1 python pkg to release when create new tag by irexyc in https://github.com/InternLM/lmdeploy/pull/784
* Report the inference benchmark of models with different size by lvhan028 in https://github.com/InternLM/lmdeploy/pull/794
* Add chat template for Yi by AllentDan in https://github.com/InternLM/lmdeploy/pull/779
🐞 Bug fixes
* Fix early-exit condition in attention kernel by lzhangzz in https://github.com/InternLM/lmdeploy/pull/788
* Fix missed arguments when benchmark static inference performance by lvhan028 in https://github.com/InternLM/lmdeploy/pull/787
* fix extra colon in InternLMChat7B template by C1rN09 in https://github.com/InternLM/lmdeploy/pull/796
* Fix local kv head num by lvhan028 in https://github.com/InternLM/lmdeploy/pull/806
📚 Documentations
* Update benchmark user guide by lvhan028 in https://github.com/InternLM/lmdeploy/pull/763
🌐 Other
* bump version to v0.1.0a2 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/807
New Contributors
* C1rN09 made their first contribution in https://github.com/InternLM/lmdeploy/pull/796
**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.1.0a1...v0.1.0a2