<!-- Release notes generated using configuration in .github/release.yml at main -->
What's Changed
π Features
* support yarn in turbomind backend by irexyc in https://github.com/InternLM/lmdeploy/pull/2519
* add linear op on dlinfer platform by yao-fengchen in https://github.com/InternLM/lmdeploy/pull/2627
* support turbomind head_dim 64 by irexyc in https://github.com/InternLM/lmdeploy/pull/2715
* [Feature]: support LlavaForConditionalGeneration with turbomind inference by deepindeed2022 in https://github.com/InternLM/lmdeploy/pull/2710
* Support Mono-InternVL with PyTorch backend by wzk1015 in https://github.com/InternLM/lmdeploy/pull/2727
* Support Qwen2-MoE models by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2723
* Support mixtral moe AWQ quantization. by AllentDan in https://github.com/InternLM/lmdeploy/pull/2725
* Support chemvlm by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2738
* Support molmo in turbomind by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2716
π₯ Improvements
* Call cuda empty_cache to prevent OOM when quantizing model by AllentDan in https://github.com/InternLM/lmdeploy/pull/2671
* feat: support dynamic/llama3 rotary embedding in ascend graph mode by tangzhiyi11 in https://github.com/InternLM/lmdeploy/pull/2670
* Add ensure_ascii = False for json.dumps by AllentDan in https://github.com/InternLM/lmdeploy/pull/2707
* Flatten cache and add flashattention by grimoire in https://github.com/InternLM/lmdeploy/pull/2676
* Support ep, column major moe kernel. by grimoire in https://github.com/InternLM/lmdeploy/pull/2690
* Remove one of the duplicate bos tokens by AllentDan in https://github.com/InternLM/lmdeploy/pull/2708
* Check server input by irexyc in https://github.com/InternLM/lmdeploy/pull/2719
* optimize dlinfer moe by tangzhiyi11 in https://github.com/InternLM/lmdeploy/pull/2741
π Bug fixes
* Support min_tokens, min_p parameters for api_server by AllentDan in https://github.com/InternLM/lmdeploy/pull/2681
* fix index error when computing ppl on long-text prompt by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2697
* Better tp exit log. by grimoire in https://github.com/InternLM/lmdeploy/pull/2677
* miss to read moe_ffn weights from converted tm model by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2698
* Fix turbomind TP by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2706
* fix decoding kernel for deepseekv2 by grimoire in https://github.com/InternLM/lmdeploy/pull/2688
* fix tp exit code for pytorch engine by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2718
* fix assert pad >= 0 failed when inter_size is not a multiple of group⦠by Vinkle-hzt in https://github.com/InternLM/lmdeploy/pull/2740
* fix issue that mono-internvl failed to fallback pytorch engine by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2744
* Remove use_fast=True when loading tokenizer for lite auto_awq by AllentDan in https://github.com/InternLM/lmdeploy/pull/2758
* set wrong head_dim for mistral-nemo by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2761
π Documentations
* Update ascend readme by jinminxi104 in https://github.com/InternLM/lmdeploy/pull/2756
* fix ascend get_started.md link by CyCle1024 in https://github.com/InternLM/lmdeploy/pull/2696
* Fix llama3.2 VL vision in "Supported Modals" documents by blankanswer in https://github.com/InternLM/lmdeploy/pull/2703
π Other
* [ci] support v100 dailytest by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2665
* [ci] add more testcase into evaluation and daily test by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2721
* feat: support multi cards in ascend graph mode by tangzhiyi11 in https://github.com/InternLM/lmdeploy/pull/2755
* bump version to v0.6.3 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2754
New Contributors
* blankanswer made their first contribution in https://github.com/InternLM/lmdeploy/pull/2703
* tangzhiyi11 made their first contribution in https://github.com/InternLM/lmdeploy/pull/2670
* wzk1015 made their first contribution in https://github.com/InternLM/lmdeploy/pull/2727
* Vinkle-hzt made their first contribution in https://github.com/InternLM/lmdeploy/pull/2740
**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.6.2...v0.6.3