Lmdeploy

Latest version: v0.6.3

Safety actively analyzes 681881 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 7

0.2.3

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
๐Ÿš€ Features
* Support loading model from modelscope by irexyc in https://github.com/InternLM/lmdeploy/pull/1069
๐Ÿ’ฅ Improvements
* Remove caching tokenizer.json by grimoire in https://github.com/InternLM/lmdeploy/pull/1074
* Refactor `get_logger` to remove the dependency of MMLogger from mmengine by yinfan98 in https://github.com/InternLM/lmdeploy/pull/1064
* Use TM_LOG_LEVEL environment variable first by zhyncs in https://github.com/InternLM/lmdeploy/pull/1071
* Speed up the initialization of w8a8 model for torch engine by yinfan98 in https://github.com/InternLM/lmdeploy/pull/1088
* Make logging.logger's behavior consistent with MMLogger by irexyc in https://github.com/InternLM/lmdeploy/pull/1092
* Remove owned_session for torch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/1097
* Unify engine initialization in pipeline by irexyc in https://github.com/InternLM/lmdeploy/pull/1085
* Add skip_special_tokens in GenerationConfig by grimoire in https://github.com/InternLM/lmdeploy/pull/1091
* Use default stop words for turbomind backend in pipeline by irexyc in https://github.com/InternLM/lmdeploy/pull/1119
* Add input_token_len to Response and update Response document by AllentDan in https://github.com/InternLM/lmdeploy/pull/1115
๐Ÿž Bug fixes
* Fix fast tokenizer swallows prefix space when there are too many white spaces by AllentDan in https://github.com/InternLM/lmdeploy/pull/992
* Fix turbomind CUDA runtime error invalid argument by zhyncs in https://github.com/InternLM/lmdeploy/pull/1100
* Add safety check for incremental decode by AllentDan in https://github.com/InternLM/lmdeploy/pull/1094
* Fix device type of get_ppl for turbomind by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1093
* Fix pipeline init turbomind from workspace by irexyc in https://github.com/InternLM/lmdeploy/pull/1126
* Add dependency version check and fix `ignore_eos` logic by grimoire in https://github.com/InternLM/lmdeploy/pull/1099
* Change configuration_internlm.py to configuration_internlm2.py by HIT-cwh in https://github.com/InternLM/lmdeploy/pull/1129

๐Ÿ“š Documentations
* Update contribution guide by zhyncs in https://github.com/InternLM/lmdeploy/pull/1120
๐ŸŒ Other
* Bump version to v0.2.3 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1123

New Contributors
* yinfan98 made their first contribution in https://github.com/InternLM/lmdeploy/pull/1064

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.2.2...v0.2.3

0.2.2

<!-- Release notes generated using configuration in .github/release.yml at main -->

Highlight
English version
* The allocation strategy for k/v cache is changed. The parameter `cache_max_entry_count` defaults to 0.8. It means the proportion of GPU **FREE** memory rather than **TOTAL** memory. The default value is updated to 0.8. It can help prevent OOM issues.
* The pipeline API supports streaming inference. You may give it a try!
python
from lmdeploy import pipeline
pipe = pipeline('internlm/internlm2-chat-7b')
for item in pipe.stream_infer('hi, please intro yourself'):
print(item)

* Add api key and ssl to `api_server`
Chinese version
* TurboMind engine ไฟฎๆ”นไบ†GPU memoryๅˆ†้…็ญ–็•ฅใ€‚k/v cache ๅ†…ๅญ˜ๆฏ”ไพ‹ๅ‚ๆ•ฐ cache_max_entry_count ็ผบ็œๅ€ผๅ˜ๆ›ดไธบ 0.8ใ€‚ๅฎƒ่กจ็คบ GPU**็ฉบ้—ฒๅ†…ๅญ˜**็š„ๆฏ”ไพ‹๏ผŒไธๅ†ๆ˜ฏ GPU **ๆ€ปๅ†…ๅญ˜**็š„ๆฏ”ไพ‹ใ€‚
* Pipeline ๆ”ฏๆŒๆตๅผ่พ“ๅ‡บๆŽฅๅฃใ€‚ๅฏไปฅๅฐ่ฏ•ไธ‹ๅฆ‚ไธ‹ไปฃ็ ๏ผš
python
from lmdeploy import pipeline
pipe = pipeline('internlm/internlm2-chat-7b')
for item in pipe.stream_infer('hi, please intro yourself'):
print(item)

* api_server ๅœจๆŽฅๅฃไธญๅขžๅŠ ไบ† api_key


What's Changed
๐Ÿš€ Features
* add alignment tools by grimoire in https://github.com/InternLM/lmdeploy/pull/1004
* support min_length for turbomind backend by irexyc in https://github.com/InternLM/lmdeploy/pull/961
* Add stream mode function to pipeline by AllentDan in https://github.com/InternLM/lmdeploy/pull/974
* [Feature] Add api key and ssl to http server by AllentDan in https://github.com/InternLM/lmdeploy/pull/1048

๐Ÿ’ฅ Improvements
* hide stop-words in output text by grimoire in https://github.com/InternLM/lmdeploy/pull/991
* optimize sleep by grimoire in https://github.com/InternLM/lmdeploy/pull/1034
* set example values to /v1/chat/completions in swagger UI by AllentDan in https://github.com/InternLM/lmdeploy/pull/984
* Update adapters cli argument by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1039
* Fix turbomind end session bug. Add huggingface demo document by AllentDan in https://github.com/InternLM/lmdeploy/pull/1017
* Support linking the custom built mpi by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1025
* sync mem size for tp by lzhangzz in https://github.com/InternLM/lmdeploy/pull/1053
* Remove model name when loading hf model by irexyc in https://github.com/InternLM/lmdeploy/pull/1022
* support internlm2-1_8b by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1073
* Update chat template for internlm2 base model by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1079
๐Ÿž Bug fixes
* fix TorchEngine stuck when benchmarking with `tp>1` by grimoire in https://github.com/InternLM/lmdeploy/pull/942
* fix module mapping error of baichuan model by grimoire in https://github.com/InternLM/lmdeploy/pull/977
* fix import error for triton server by RunningLeon in https://github.com/InternLM/lmdeploy/pull/985
* fix qwen-vl example by irexyc in https://github.com/InternLM/lmdeploy/pull/996
* fix missing init file in modules by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1013
* fix tp mem usage by grimoire in https://github.com/InternLM/lmdeploy/pull/987
* update indexes_containing_token function by AllentDan in https://github.com/InternLM/lmdeploy/pull/1050
* fix flash kernel on sm 70 by grimoire in https://github.com/InternLM/lmdeploy/pull/1027
* Fix baichuan2 lora by grimoire in https://github.com/InternLM/lmdeploy/pull/1042
* Fix modelconfig in pytorch engine, support YI. by grimoire in https://github.com/InternLM/lmdeploy/pull/1052
* Fix repetition penalty for long context by irexyc in https://github.com/InternLM/lmdeploy/pull/1037
* [Fix] Support QLinear in rowwise_parallelize_linear_fn and colwise_parallelize_linear_fn by HIT-cwh in https://github.com/InternLM/lmdeploy/pull/1072
๐Ÿ“š Documentations
* add docs for evaluation with opencompass by RunningLeon in https://github.com/InternLM/lmdeploy/pull/995
* update docs for kvint8 by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1026
* [doc] Introduce project OpenAOE by JiaYingLii in https://github.com/InternLM/lmdeploy/pull/1049
* update pipeline guide and FAQ about OOM by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1051
* docs update cache_max_entry_count for turbomind config by zhyncs in https://github.com/InternLM/lmdeploy/pull/1067
๐ŸŒ Other
* update ut ci to new server node by RunningLeon in https://github.com/InternLM/lmdeploy/pull/1024
* Ete testcase update by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/1023
* fix OOM in BlockManager by zhyncs in https://github.com/InternLM/lmdeploy/pull/973
* fix use engine_config.tp when tp is None by zhyncs in https://github.com/InternLM/lmdeploy/pull/1057
* Fix serve api by moving logger inside process for turbomind by AllentDan in https://github.com/InternLM/lmdeploy/pull/1061
* bump version to v0.2.2 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1076

New Contributors
* zhyncs made their first contribution in https://github.com/InternLM/lmdeploy/pull/973
* JiaYingLii made their first contribution in https://github.com/InternLM/lmdeploy/pull/1049

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.2.1...v0.2.2

0.2.1

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
๐Ÿ’ฅ Improvements
* [Fix] interlm2 chat format by Harold-lkk in https://github.com/InternLM/lmdeploy/pull/1002
๐Ÿž Bug fixes
* fix baichuan2 conversion by AllentDan in https://github.com/InternLM/lmdeploy/pull/972
* [Fix] interlm messages2prompt by Harold-lkk in https://github.com/InternLM/lmdeploy/pull/1003
๐Ÿ“š Documentations
* add guide about installation on cuda 12+ platform by lvhan028 in https://github.com/InternLM/lmdeploy/pull/988
๐ŸŒ Other
* bump version to v0.2.1 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1005


**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.2.0...v0.2.1

0.2.0

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
๐Ÿš€ Features
* Support internlm2 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/963
* [Feature] Add params config for api server web_ui by amulil in https://github.com/InternLM/lmdeploy/pull/735
* [Feature]Merge `lmdeploy lite calibrate` and `lmdeploy lite auto_awq` by pppppM in https://github.com/InternLM/lmdeploy/pull/849
* Compute cross entropy loss given a list of input tokens by lvhan028 in https://github.com/InternLM/lmdeploy/pull/830
* Support QoS in api_server by sallyjunjun in https://github.com/InternLM/lmdeploy/pull/877
* Refactor torch inference engine by lvhan028 in https://github.com/InternLM/lmdeploy/pull/871
* add image chat demo by irexyc in https://github.com/InternLM/lmdeploy/pull/874
* check-in generation config by lvhan028 in https://github.com/InternLM/lmdeploy/pull/902
* check-in ModelConfig by AllentDan in https://github.com/InternLM/lmdeploy/pull/907
* pytorch engine config by grimoire in https://github.com/InternLM/lmdeploy/pull/908
* Check-in turbomind engine config by irexyc in https://github.com/InternLM/lmdeploy/pull/909
* S-LoRA support by grimoire in https://github.com/InternLM/lmdeploy/pull/894
* add init in adapters by grimoire in https://github.com/InternLM/lmdeploy/pull/923
* Refactor LLM inference pipeline API by AllentDan in https://github.com/InternLM/lmdeploy/pull/916
* Refactor gradio and api_server by AllentDan in https://github.com/InternLM/lmdeploy/pull/918
* Add request distributor server by AllentDan in https://github.com/InternLM/lmdeploy/pull/903
* Upgrade lmdeploy cli by RunningLeon in https://github.com/InternLM/lmdeploy/pull/922

๐Ÿ’ฅ Improvements
* add top_k value for /v1/completions and update the documents by AllentDan in https://github.com/InternLM/lmdeploy/pull/870
* export "num_tokens_per_iter", "max_prefill_iters" and etc when converting a model by lvhan028 in https://github.com/InternLM/lmdeploy/pull/845
* Move `api_server` dependencies from serve.txt to runtime.txt by lvhan028 in https://github.com/InternLM/lmdeploy/pull/879
* Refactor benchmark bash script by lvhan028 in https://github.com/InternLM/lmdeploy/pull/884
* Add test case for function regression by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/844
* Update test triton CI by RunningLeon in https://github.com/InternLM/lmdeploy/pull/893
* Update dockerfile by RunningLeon in https://github.com/InternLM/lmdeploy/pull/891
* Perform fuzzy matching on chat template according to model path by AllentDan in https://github.com/InternLM/lmdeploy/pull/839
* support accessing lmdeploy version by lmdeploy.version_info by lvhan028 in https://github.com/InternLM/lmdeploy/pull/910
* Remove `flash-attn` dependency of lmdeploy lite module by lvhan028 in https://github.com/InternLM/lmdeploy/pull/917
* Improve setup by removing pycuda dependency and adding cuda runtime and cublas to RPATH by irexyc in https://github.com/InternLM/lmdeploy/pull/912
* remove unused settings in turbomind engine config by irexyc in https://github.com/InternLM/lmdeploy/pull/921
* Cleanup fixed attributes in turbomind engine config by irexyc in https://github.com/InternLM/lmdeploy/pull/928
* fix get_gpu_mem by grimoire in https://github.com/InternLM/lmdeploy/pull/934
* remove instance_num argument by AllentDan in https://github.com/InternLM/lmdeploy/pull/931
* Fix matching results of several chat templates like llama2, solar, yi and so on by AllentDan in https://github.com/InternLM/lmdeploy/pull/925
* add pytorch random sampling by grimoire in https://github.com/InternLM/lmdeploy/pull/930
* suppress turbomind chat warning by irexyc in https://github.com/InternLM/lmdeploy/pull/937
* modify type hint of api to avoid import _turbomind by AllentDan in https://github.com/InternLM/lmdeploy/pull/936
* accelerate pytorch benchmark by grimoire in https://github.com/InternLM/lmdeploy/pull/946
* Remove `tp` from pipline argument list by lvhan028 in https://github.com/InternLM/lmdeploy/pull/947
* set gradio default value the same as chat.py by AllentDan in https://github.com/InternLM/lmdeploy/pull/949
* print help for cli in case of failure by RunningLeon in https://github.com/InternLM/lmdeploy/pull/955
* return dataclass for pipeline by AllentDan in https://github.com/InternLM/lmdeploy/pull/952
* set random seed when it is None by AllentDan in https://github.com/InternLM/lmdeploy/pull/958
* avoid run get_logger when import lmdeploy by RunningLeon in https://github.com/InternLM/lmdeploy/pull/956
* support mlp s-lora by grimoire in https://github.com/InternLM/lmdeploy/pull/957
* skip resume logic for pytorch backend by AllentDan in https://github.com/InternLM/lmdeploy/pull/968
* Add ci for ut by RunningLeon in https://github.com/InternLM/lmdeploy/pull/966

๐Ÿž Bug fixes
* add tritonclient req by RunningLeon in https://github.com/InternLM/lmdeploy/pull/872
* Fix uninitialized parameter by lvhan028 in https://github.com/InternLM/lmdeploy/pull/875
* Fix overflow by irexyc in https://github.com/InternLM/lmdeploy/pull/897
* Fix data offset by AllentDan in https://github.com/InternLM/lmdeploy/pull/900
* Fix context decoding stuck issue when tp > 1 by irexyc in https://github.com/InternLM/lmdeploy/pull/904
* [Fix] set scaling_factor 1 forcefully when sequence length is less than max_pos_emb by lvhan028 in https://github.com/InternLM/lmdeploy/pull/911
* fix pytorch llama2 with new transformers by grimoire in https://github.com/InternLM/lmdeploy/pull/914
* fix local variable 'output_ids' referenced before assignment by irexyc in https://github.com/InternLM/lmdeploy/pull/919
* fix pipeline stop_words type error by AllentDan in https://github.com/InternLM/lmdeploy/pull/929
* pass stop words to openai api by AllentDan in https://github.com/InternLM/lmdeploy/pull/887
* fix profile generation multiprocessing error by AllentDan in https://github.com/InternLM/lmdeploy/pull/933
* Miss __init__.py in modeling folder by lvhan028 in https://github.com/InternLM/lmdeploy/pull/951
* fix cli with special arg names by RunningLeon in https://github.com/InternLM/lmdeploy/pull/959
* fix logger in tokenizer by RunningLeon in https://github.com/InternLM/lmdeploy/pull/960
๐Ÿ“š Documentations
* Improve user guide by lvhan028 in https://github.com/InternLM/lmdeploy/pull/899
* Add user guide about pytorch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/915
* Update supported models and add quick start section in README by lvhan028 in https://github.com/InternLM/lmdeploy/pull/926
* Fix scripts in benchmark doc by panli889 in https://github.com/InternLM/lmdeploy/pull/941
* Update get_started and w4a16 tutorials by lvhan028 in https://github.com/InternLM/lmdeploy/pull/945
* Add more docstring to api_server and proxy_server by AllentDan in https://github.com/InternLM/lmdeploy/pull/965
* stable api_server benchmark result by a non-zero await by AllentDan in https://github.com/InternLM/lmdeploy/pull/885
* fix pytorch backend can not properly stop by AllentDan in https://github.com/InternLM/lmdeploy/pull/962
* [Fix] Fix `calibrate` bug when `transformers>4.36` by pppppM in https://github.com/InternLM/lmdeploy/pull/967

๐ŸŒ Other
* bump version to v0.2.0 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/969

New Contributors
* amulil made their first contribution in https://github.com/InternLM/lmdeploy/pull/735
* zhulinJulia24 made their first contribution in https://github.com/InternLM/lmdeploy/pull/844
* sallyjunjun made their first contribution in https://github.com/InternLM/lmdeploy/pull/877
* panli889 made their first contribution in https://github.com/InternLM/lmdeploy/pull/941

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.1.0...v0.2.0

0.1.0

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
๐Ÿš€ Features
* Add extra_requires to reduce dependencies by RunningLeon in https://github.com/InternLM/lmdeploy/pull/580
* TurboMind 2 by lzhangzz in https://github.com/InternLM/lmdeploy/pull/590
* Support loading hf model directly by irexyc in https://github.com/InternLM/lmdeploy/pull/685
* convert model with hf repo_id by irexyc in https://github.com/InternLM/lmdeploy/pull/774
* Support turbomind bf16 by grimoire in https://github.com/InternLM/lmdeploy/pull/803
* support image_embs input by irexyc in https://github.com/InternLM/lmdeploy/pull/799
* Add api.py by AllentDan in https://github.com/InternLM/lmdeploy/pull/805

๐Ÿ’ฅ Improvements
* Fix Tokenizer encode by AllentDan in https://github.com/InternLM/lmdeploy/pull/645
* Optimize for throughput by lzhangzz in https://github.com/InternLM/lmdeploy/pull/701
* Replace mmengine with mmengine-lite by zhouzaida in https://github.com/InternLM/lmdeploy/pull/715
* Set the default value of `max_context_token_num` 1 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/761
* add triton server test and workflow yml by RunningLeon in https://github.com/InternLM/lmdeploy/pull/760
* improvement(build): enable ninja and gold linker by tpoisonooo in https://github.com/InternLM/lmdeploy/pull/767
* Report first-token-latency and token-latency percentiles by lvhan028 in https://github.com/InternLM/lmdeploy/pull/736
* Unify prefill & decode passes by lzhangzz in https://github.com/InternLM/lmdeploy/pull/775
* add cuda12.1 build check ci by irexyc in https://github.com/InternLM/lmdeploy/pull/782
* auto upload cuda12.1 python pkg to release when create new tag by irexyc in https://github.com/InternLM/lmdeploy/pull/784
* Report the inference benchmark of models with different size by lvhan028 in https://github.com/InternLM/lmdeploy/pull/794
* Simplify block manager by lzhangzz in https://github.com/InternLM/lmdeploy/pull/812
* Disable attention mask when it is not needed by lzhangzz in https://github.com/InternLM/lmdeploy/pull/813
* FIFO pipe strategy for api_server by AllentDan in https://github.com/InternLM/lmdeploy/pull/795
* simplify the header of the benchmark table by lvhan028 in https://github.com/InternLM/lmdeploy/pull/820
* add encode for opencompass by AllentDan in https://github.com/InternLM/lmdeploy/pull/828
* fix: awq should save bin files by hscspring in https://github.com/InternLM/lmdeploy/pull/793
* Support building docker image manually in CI by RunningLeon in https://github.com/InternLM/lmdeploy/pull/825

๐Ÿž Bug fixes
* Fix init of batch state by lzhangzz in https://github.com/InternLM/lmdeploy/pull/682
* fix turbomind stream canceling by grimoire in https://github.com/InternLM/lmdeploy/pull/686
* [Fix] Fix load_checkpoint_in_model bug by HIT-cwh in https://github.com/InternLM/lmdeploy/pull/690
* Fix wrong eos_id and bos_id obtained through grpc api by lvhan028 in https://github.com/InternLM/lmdeploy/pull/644
* Fix cache/output length calculation by lzhangzz in https://github.com/InternLM/lmdeploy/pull/738
* [Fix] Skip empty batch by lzhangzz in https://github.com/InternLM/lmdeploy/pull/747
* [Fix] build docker image failed since `packaging` is missing by lvhan028 in https://github.com/InternLM/lmdeploy/pull/753
* [Fix] Rollback the data type of `input_ids` to `TYPE_UINT32` in preprocessor's proto by lvhan028 in https://github.com/InternLM/lmdeploy/pull/758
* fix turbomind build on sm<80 by grimoire in https://github.com/InternLM/lmdeploy/pull/754
* Fix early-exit condition in attention kernel by lzhangzz in https://github.com/InternLM/lmdeploy/pull/788
* Fix missed arguments when benchmark static inference performance by lvhan028 in https://github.com/InternLM/lmdeploy/pull/787
* fix extra colon in InternLMChat7B template by C1rN09 in https://github.com/InternLM/lmdeploy/pull/796
* Fix local kv head num by lvhan028 in https://github.com/InternLM/lmdeploy/pull/806
* Fix out-of-bound access by lzhangzz in https://github.com/InternLM/lmdeploy/pull/809
* Set smem size for repetition penalty kernel by lzhangzz in https://github.com/InternLM/lmdeploy/pull/818
* Fix cache verification by lzhangzz in https://github.com/InternLM/lmdeploy/pull/821
* fix finish_reason by AllentDan in https://github.com/InternLM/lmdeploy/pull/816
* fix turbomind awq by grimoire in https://github.com/InternLM/lmdeploy/pull/847
* Fix stop requests by await before turbomind queue.get() by AllentDan in https://github.com/InternLM/lmdeploy/pull/850
* [Fix] Fix meta tensor error by pppppM in https://github.com/InternLM/lmdeploy/pull/848
* Fix cuda reinitialization in a multiprocessing setting by grimoire in https://github.com/InternLM/lmdeploy/pull/862
* launch gradio server directly with hf model by AllentDan in https://github.com/InternLM/lmdeploy/pull/856
* fix typo by grimoire in https://github.com/InternLM/lmdeploy/pull/769
* Add chat template for Yi by AllentDan in https://github.com/InternLM/lmdeploy/pull/779
* fix api_server stop_session and end_session by AllentDan in https://github.com/InternLM/lmdeploy/pull/835
* Return the iterator after erasing it from a map by irexyc in https://github.com/InternLM/lmdeploy/pull/864

๐Ÿ“š Documentations
* [Docs] Update Supported Matrix by pppppM in https://github.com/InternLM/lmdeploy/pull/679
* [Docs] Update KV8 Docs by pppppM in https://github.com/InternLM/lmdeploy/pull/681
* [Doc] Update restful api doc by AllentDan in https://github.com/InternLM/lmdeploy/pull/662
* Check-in user guide about turbomind config by lvhan028 in https://github.com/InternLM/lmdeploy/pull/680
* Update benchmark user guide by lvhan028 in https://github.com/InternLM/lmdeploy/pull/763
* [Docs] Fix typo in `restful_api ` user guide by maxchiron in https://github.com/InternLM/lmdeploy/pull/858
* [Docs] Fix typo in `restful_api ` user guide by maxchiron in https://github.com/InternLM/lmdeploy/pull/859

๐ŸŒ Other
* bump version to v0.1.0a0 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/709
* bump version to 0.1.0a1 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/776
* bump version to v0.1.0a2 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/807
* bump version to v0.1.0 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/834

New Contributors
* zhouzaida made their first contribution in https://github.com/InternLM/lmdeploy/pull/715
* C1rN09 made their first contribution in https://github.com/InternLM/lmdeploy/pull/796
* maxchiron made their first contribution in https://github.com/InternLM/lmdeploy/pull/858

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.0.14...v0.1.0

0.1.0a2

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
๐Ÿ’ฅ Improvements
* Unify prefill & decode passes by lzhangzz in https://github.com/InternLM/lmdeploy/pull/775
* add cuda12.1 build check ci by irexyc in https://github.com/InternLM/lmdeploy/pull/782
* auto upload cuda12.1 python pkg to release when create new tag by irexyc in https://github.com/InternLM/lmdeploy/pull/784
* Report the inference benchmark of models with different size by lvhan028 in https://github.com/InternLM/lmdeploy/pull/794
* Add chat template for Yi by AllentDan in https://github.com/InternLM/lmdeploy/pull/779
๐Ÿž Bug fixes
* Fix early-exit condition in attention kernel by lzhangzz in https://github.com/InternLM/lmdeploy/pull/788
* Fix missed arguments when benchmark static inference performance by lvhan028 in https://github.com/InternLM/lmdeploy/pull/787
* fix extra colon in InternLMChat7B template by C1rN09 in https://github.com/InternLM/lmdeploy/pull/796
* Fix local kv head num by lvhan028 in https://github.com/InternLM/lmdeploy/pull/806
๐Ÿ“š Documentations
* Update benchmark user guide by lvhan028 in https://github.com/InternLM/lmdeploy/pull/763
๐ŸŒ Other
* bump version to v0.1.0a2 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/807

New Contributors
* C1rN09 made their first contribution in https://github.com/InternLM/lmdeploy/pull/796

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.1.0a1...v0.1.0a2

Page 4 of 7

ยฉ 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.