<!-- Release notes generated using configuration in .github/release.yml at main -->
What's Changed
🚀 Features
* Support internlm2 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/963
* [Feature] Add params config for api server web_ui by amulil in https://github.com/InternLM/lmdeploy/pull/735
* [Feature]Merge `lmdeploy lite calibrate` and `lmdeploy lite auto_awq` by pppppM in https://github.com/InternLM/lmdeploy/pull/849
* Compute cross entropy loss given a list of input tokens by lvhan028 in https://github.com/InternLM/lmdeploy/pull/830
* Support QoS in api_server by sallyjunjun in https://github.com/InternLM/lmdeploy/pull/877
* Refactor torch inference engine by lvhan028 in https://github.com/InternLM/lmdeploy/pull/871
* add image chat demo by irexyc in https://github.com/InternLM/lmdeploy/pull/874
* check-in generation config by lvhan028 in https://github.com/InternLM/lmdeploy/pull/902
* check-in ModelConfig by AllentDan in https://github.com/InternLM/lmdeploy/pull/907
* pytorch engine config by grimoire in https://github.com/InternLM/lmdeploy/pull/908
* Check-in turbomind engine config by irexyc in https://github.com/InternLM/lmdeploy/pull/909
* S-LoRA support by grimoire in https://github.com/InternLM/lmdeploy/pull/894
* add init in adapters by grimoire in https://github.com/InternLM/lmdeploy/pull/923
* Refactor LLM inference pipeline API by AllentDan in https://github.com/InternLM/lmdeploy/pull/916
* Refactor gradio and api_server by AllentDan in https://github.com/InternLM/lmdeploy/pull/918
* Add request distributor server by AllentDan in https://github.com/InternLM/lmdeploy/pull/903
* Upgrade lmdeploy cli by RunningLeon in https://github.com/InternLM/lmdeploy/pull/922
💥 Improvements
* add top_k value for /v1/completions and update the documents by AllentDan in https://github.com/InternLM/lmdeploy/pull/870
* export "num_tokens_per_iter", "max_prefill_iters" and etc when converting a model by lvhan028 in https://github.com/InternLM/lmdeploy/pull/845
* Move `api_server` dependencies from serve.txt to runtime.txt by lvhan028 in https://github.com/InternLM/lmdeploy/pull/879
* Refactor benchmark bash script by lvhan028 in https://github.com/InternLM/lmdeploy/pull/884
* Add test case for function regression by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/844
* Update test triton CI by RunningLeon in https://github.com/InternLM/lmdeploy/pull/893
* Update dockerfile by RunningLeon in https://github.com/InternLM/lmdeploy/pull/891
* Perform fuzzy matching on chat template according to model path by AllentDan in https://github.com/InternLM/lmdeploy/pull/839
* support accessing lmdeploy version by lmdeploy.version_info by lvhan028 in https://github.com/InternLM/lmdeploy/pull/910
* Remove `flash-attn` dependency of lmdeploy lite module by lvhan028 in https://github.com/InternLM/lmdeploy/pull/917
* Improve setup by removing pycuda dependency and adding cuda runtime and cublas to RPATH by irexyc in https://github.com/InternLM/lmdeploy/pull/912
* remove unused settings in turbomind engine config by irexyc in https://github.com/InternLM/lmdeploy/pull/921
* Cleanup fixed attributes in turbomind engine config by irexyc in https://github.com/InternLM/lmdeploy/pull/928
* fix get_gpu_mem by grimoire in https://github.com/InternLM/lmdeploy/pull/934
* remove instance_num argument by AllentDan in https://github.com/InternLM/lmdeploy/pull/931
* Fix matching results of several chat templates like llama2, solar, yi and so on by AllentDan in https://github.com/InternLM/lmdeploy/pull/925
* add pytorch random sampling by grimoire in https://github.com/InternLM/lmdeploy/pull/930
* suppress turbomind chat warning by irexyc in https://github.com/InternLM/lmdeploy/pull/937
* modify type hint of api to avoid import _turbomind by AllentDan in https://github.com/InternLM/lmdeploy/pull/936
* accelerate pytorch benchmark by grimoire in https://github.com/InternLM/lmdeploy/pull/946
* Remove `tp` from pipline argument list by lvhan028 in https://github.com/InternLM/lmdeploy/pull/947
* set gradio default value the same as chat.py by AllentDan in https://github.com/InternLM/lmdeploy/pull/949
* print help for cli in case of failure by RunningLeon in https://github.com/InternLM/lmdeploy/pull/955
* return dataclass for pipeline by AllentDan in https://github.com/InternLM/lmdeploy/pull/952
* set random seed when it is None by AllentDan in https://github.com/InternLM/lmdeploy/pull/958
* avoid run get_logger when import lmdeploy by RunningLeon in https://github.com/InternLM/lmdeploy/pull/956
* support mlp s-lora by grimoire in https://github.com/InternLM/lmdeploy/pull/957
* skip resume logic for pytorch backend by AllentDan in https://github.com/InternLM/lmdeploy/pull/968
* Add ci for ut by RunningLeon in https://github.com/InternLM/lmdeploy/pull/966
🐞 Bug fixes
* add tritonclient req by RunningLeon in https://github.com/InternLM/lmdeploy/pull/872
* Fix uninitialized parameter by lvhan028 in https://github.com/InternLM/lmdeploy/pull/875
* Fix overflow by irexyc in https://github.com/InternLM/lmdeploy/pull/897
* Fix data offset by AllentDan in https://github.com/InternLM/lmdeploy/pull/900
* Fix context decoding stuck issue when tp > 1 by irexyc in https://github.com/InternLM/lmdeploy/pull/904
* [Fix] set scaling_factor 1 forcefully when sequence length is less than max_pos_emb by lvhan028 in https://github.com/InternLM/lmdeploy/pull/911
* fix pytorch llama2 with new transformers by grimoire in https://github.com/InternLM/lmdeploy/pull/914
* fix local variable 'output_ids' referenced before assignment by irexyc in https://github.com/InternLM/lmdeploy/pull/919
* fix pipeline stop_words type error by AllentDan in https://github.com/InternLM/lmdeploy/pull/929
* pass stop words to openai api by AllentDan in https://github.com/InternLM/lmdeploy/pull/887
* fix profile generation multiprocessing error by AllentDan in https://github.com/InternLM/lmdeploy/pull/933
* Miss __init__.py in modeling folder by lvhan028 in https://github.com/InternLM/lmdeploy/pull/951
* fix cli with special arg names by RunningLeon in https://github.com/InternLM/lmdeploy/pull/959
* fix logger in tokenizer by RunningLeon in https://github.com/InternLM/lmdeploy/pull/960
📚 Documentations
* Improve user guide by lvhan028 in https://github.com/InternLM/lmdeploy/pull/899
* Add user guide about pytorch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/915
* Update supported models and add quick start section in README by lvhan028 in https://github.com/InternLM/lmdeploy/pull/926
* Fix scripts in benchmark doc by panli889 in https://github.com/InternLM/lmdeploy/pull/941
* Update get_started and w4a16 tutorials by lvhan028 in https://github.com/InternLM/lmdeploy/pull/945
* Add more docstring to api_server and proxy_server by AllentDan in https://github.com/InternLM/lmdeploy/pull/965
* stable api_server benchmark result by a non-zero await by AllentDan in https://github.com/InternLM/lmdeploy/pull/885
* fix pytorch backend can not properly stop by AllentDan in https://github.com/InternLM/lmdeploy/pull/962
* [Fix] Fix `calibrate` bug when `transformers>4.36` by pppppM in https://github.com/InternLM/lmdeploy/pull/967
🌐 Other
* bump version to v0.2.0 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/969
New Contributors
* amulil made their first contribution in https://github.com/InternLM/lmdeploy/pull/735
* zhulinJulia24 made their first contribution in https://github.com/InternLM/lmdeploy/pull/844
* sallyjunjun made their first contribution in https://github.com/InternLM/lmdeploy/pull/877
* panli889 made their first contribution in https://github.com/InternLM/lmdeploy/pull/941
**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.1.0...v0.2.0