Lmdeploy

Latest version: v0.7.2.post1

Safety actively analyzes 723625 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 6 of 8

0.2.0

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
🚀 Features
* Support internlm2 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/963
* [Feature] Add params config for api server web_ui by amulil in https://github.com/InternLM/lmdeploy/pull/735
* [Feature]Merge `lmdeploy lite calibrate` and `lmdeploy lite auto_awq` by pppppM in https://github.com/InternLM/lmdeploy/pull/849
* Compute cross entropy loss given a list of input tokens by lvhan028 in https://github.com/InternLM/lmdeploy/pull/830
* Support QoS in api_server by sallyjunjun in https://github.com/InternLM/lmdeploy/pull/877
* Refactor torch inference engine by lvhan028 in https://github.com/InternLM/lmdeploy/pull/871
* add image chat demo by irexyc in https://github.com/InternLM/lmdeploy/pull/874
* check-in generation config by lvhan028 in https://github.com/InternLM/lmdeploy/pull/902
* check-in ModelConfig by AllentDan in https://github.com/InternLM/lmdeploy/pull/907
* pytorch engine config by grimoire in https://github.com/InternLM/lmdeploy/pull/908
* Check-in turbomind engine config by irexyc in https://github.com/InternLM/lmdeploy/pull/909
* S-LoRA support by grimoire in https://github.com/InternLM/lmdeploy/pull/894
* add init in adapters by grimoire in https://github.com/InternLM/lmdeploy/pull/923
* Refactor LLM inference pipeline API by AllentDan in https://github.com/InternLM/lmdeploy/pull/916
* Refactor gradio and api_server by AllentDan in https://github.com/InternLM/lmdeploy/pull/918
* Add request distributor server by AllentDan in https://github.com/InternLM/lmdeploy/pull/903
* Upgrade lmdeploy cli by RunningLeon in https://github.com/InternLM/lmdeploy/pull/922

💥 Improvements
* add top_k value for /v1/completions and update the documents by AllentDan in https://github.com/InternLM/lmdeploy/pull/870
* export "num_tokens_per_iter", "max_prefill_iters" and etc when converting a model by lvhan028 in https://github.com/InternLM/lmdeploy/pull/845
* Move `api_server` dependencies from serve.txt to runtime.txt by lvhan028 in https://github.com/InternLM/lmdeploy/pull/879
* Refactor benchmark bash script by lvhan028 in https://github.com/InternLM/lmdeploy/pull/884
* Add test case for function regression by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/844
* Update test triton CI by RunningLeon in https://github.com/InternLM/lmdeploy/pull/893
* Update dockerfile by RunningLeon in https://github.com/InternLM/lmdeploy/pull/891
* Perform fuzzy matching on chat template according to model path by AllentDan in https://github.com/InternLM/lmdeploy/pull/839
* support accessing lmdeploy version by lmdeploy.version_info by lvhan028 in https://github.com/InternLM/lmdeploy/pull/910
* Remove `flash-attn` dependency of lmdeploy lite module by lvhan028 in https://github.com/InternLM/lmdeploy/pull/917
* Improve setup by removing pycuda dependency and adding cuda runtime and cublas to RPATH by irexyc in https://github.com/InternLM/lmdeploy/pull/912
* remove unused settings in turbomind engine config by irexyc in https://github.com/InternLM/lmdeploy/pull/921
* Cleanup fixed attributes in turbomind engine config by irexyc in https://github.com/InternLM/lmdeploy/pull/928
* fix get_gpu_mem by grimoire in https://github.com/InternLM/lmdeploy/pull/934
* remove instance_num argument by AllentDan in https://github.com/InternLM/lmdeploy/pull/931
* Fix matching results of several chat templates like llama2, solar, yi and so on by AllentDan in https://github.com/InternLM/lmdeploy/pull/925
* add pytorch random sampling by grimoire in https://github.com/InternLM/lmdeploy/pull/930
* suppress turbomind chat warning by irexyc in https://github.com/InternLM/lmdeploy/pull/937
* modify type hint of api to avoid import _turbomind by AllentDan in https://github.com/InternLM/lmdeploy/pull/936
* accelerate pytorch benchmark by grimoire in https://github.com/InternLM/lmdeploy/pull/946
* Remove `tp` from pipline argument list by lvhan028 in https://github.com/InternLM/lmdeploy/pull/947
* set gradio default value the same as chat.py by AllentDan in https://github.com/InternLM/lmdeploy/pull/949
* print help for cli in case of failure by RunningLeon in https://github.com/InternLM/lmdeploy/pull/955
* return dataclass for pipeline by AllentDan in https://github.com/InternLM/lmdeploy/pull/952
* set random seed when it is None by AllentDan in https://github.com/InternLM/lmdeploy/pull/958
* avoid run get_logger when import lmdeploy by RunningLeon in https://github.com/InternLM/lmdeploy/pull/956
* support mlp s-lora by grimoire in https://github.com/InternLM/lmdeploy/pull/957
* skip resume logic for pytorch backend by AllentDan in https://github.com/InternLM/lmdeploy/pull/968
* Add ci for ut by RunningLeon in https://github.com/InternLM/lmdeploy/pull/966

🐞 Bug fixes
* add tritonclient req by RunningLeon in https://github.com/InternLM/lmdeploy/pull/872
* Fix uninitialized parameter by lvhan028 in https://github.com/InternLM/lmdeploy/pull/875
* Fix overflow by irexyc in https://github.com/InternLM/lmdeploy/pull/897
* Fix data offset by AllentDan in https://github.com/InternLM/lmdeploy/pull/900
* Fix context decoding stuck issue when tp > 1 by irexyc in https://github.com/InternLM/lmdeploy/pull/904
* [Fix] set scaling_factor 1 forcefully when sequence length is less than max_pos_emb by lvhan028 in https://github.com/InternLM/lmdeploy/pull/911
* fix pytorch llama2 with new transformers by grimoire in https://github.com/InternLM/lmdeploy/pull/914
* fix local variable 'output_ids' referenced before assignment by irexyc in https://github.com/InternLM/lmdeploy/pull/919
* fix pipeline stop_words type error by AllentDan in https://github.com/InternLM/lmdeploy/pull/929
* pass stop words to openai api by AllentDan in https://github.com/InternLM/lmdeploy/pull/887
* fix profile generation multiprocessing error by AllentDan in https://github.com/InternLM/lmdeploy/pull/933
* Miss __init__.py in modeling folder by lvhan028 in https://github.com/InternLM/lmdeploy/pull/951
* fix cli with special arg names by RunningLeon in https://github.com/InternLM/lmdeploy/pull/959
* fix logger in tokenizer by RunningLeon in https://github.com/InternLM/lmdeploy/pull/960
📚 Documentations
* Improve user guide by lvhan028 in https://github.com/InternLM/lmdeploy/pull/899
* Add user guide about pytorch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/915
* Update supported models and add quick start section in README by lvhan028 in https://github.com/InternLM/lmdeploy/pull/926
* Fix scripts in benchmark doc by panli889 in https://github.com/InternLM/lmdeploy/pull/941
* Update get_started and w4a16 tutorials by lvhan028 in https://github.com/InternLM/lmdeploy/pull/945
* Add more docstring to api_server and proxy_server by AllentDan in https://github.com/InternLM/lmdeploy/pull/965
* stable api_server benchmark result by a non-zero await by AllentDan in https://github.com/InternLM/lmdeploy/pull/885
* fix pytorch backend can not properly stop by AllentDan in https://github.com/InternLM/lmdeploy/pull/962
* [Fix] Fix `calibrate` bug when `transformers>4.36` by pppppM in https://github.com/InternLM/lmdeploy/pull/967

🌐 Other
* bump version to v0.2.0 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/969

New Contributors
* amulil made their first contribution in https://github.com/InternLM/lmdeploy/pull/735
* zhulinJulia24 made their first contribution in https://github.com/InternLM/lmdeploy/pull/844
* sallyjunjun made their first contribution in https://github.com/InternLM/lmdeploy/pull/877
* panli889 made their first contribution in https://github.com/InternLM/lmdeploy/pull/941

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.1.0...v0.2.0

0.1.0

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
🚀 Features
* Add extra_requires to reduce dependencies by RunningLeon in https://github.com/InternLM/lmdeploy/pull/580
* TurboMind 2 by lzhangzz in https://github.com/InternLM/lmdeploy/pull/590
* Support loading hf model directly by irexyc in https://github.com/InternLM/lmdeploy/pull/685
* convert model with hf repo_id by irexyc in https://github.com/InternLM/lmdeploy/pull/774
* Support turbomind bf16 by grimoire in https://github.com/InternLM/lmdeploy/pull/803
* support image_embs input by irexyc in https://github.com/InternLM/lmdeploy/pull/799
* Add api.py by AllentDan in https://github.com/InternLM/lmdeploy/pull/805

💥 Improvements
* Fix Tokenizer encode by AllentDan in https://github.com/InternLM/lmdeploy/pull/645
* Optimize for throughput by lzhangzz in https://github.com/InternLM/lmdeploy/pull/701
* Replace mmengine with mmengine-lite by zhouzaida in https://github.com/InternLM/lmdeploy/pull/715
* Set the default value of `max_context_token_num` 1 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/761
* add triton server test and workflow yml by RunningLeon in https://github.com/InternLM/lmdeploy/pull/760
* improvement(build): enable ninja and gold linker by tpoisonooo in https://github.com/InternLM/lmdeploy/pull/767
* Report first-token-latency and token-latency percentiles by lvhan028 in https://github.com/InternLM/lmdeploy/pull/736
* Unify prefill & decode passes by lzhangzz in https://github.com/InternLM/lmdeploy/pull/775
* add cuda12.1 build check ci by irexyc in https://github.com/InternLM/lmdeploy/pull/782
* auto upload cuda12.1 python pkg to release when create new tag by irexyc in https://github.com/InternLM/lmdeploy/pull/784
* Report the inference benchmark of models with different size by lvhan028 in https://github.com/InternLM/lmdeploy/pull/794
* Simplify block manager by lzhangzz in https://github.com/InternLM/lmdeploy/pull/812
* Disable attention mask when it is not needed by lzhangzz in https://github.com/InternLM/lmdeploy/pull/813
* FIFO pipe strategy for api_server by AllentDan in https://github.com/InternLM/lmdeploy/pull/795
* simplify the header of the benchmark table by lvhan028 in https://github.com/InternLM/lmdeploy/pull/820
* add encode for opencompass by AllentDan in https://github.com/InternLM/lmdeploy/pull/828
* fix: awq should save bin files by hscspring in https://github.com/InternLM/lmdeploy/pull/793
* Support building docker image manually in CI by RunningLeon in https://github.com/InternLM/lmdeploy/pull/825

🐞 Bug fixes
* Fix init of batch state by lzhangzz in https://github.com/InternLM/lmdeploy/pull/682
* fix turbomind stream canceling by grimoire in https://github.com/InternLM/lmdeploy/pull/686
* [Fix] Fix load_checkpoint_in_model bug by HIT-cwh in https://github.com/InternLM/lmdeploy/pull/690
* Fix wrong eos_id and bos_id obtained through grpc api by lvhan028 in https://github.com/InternLM/lmdeploy/pull/644
* Fix cache/output length calculation by lzhangzz in https://github.com/InternLM/lmdeploy/pull/738
* [Fix] Skip empty batch by lzhangzz in https://github.com/InternLM/lmdeploy/pull/747
* [Fix] build docker image failed since `packaging` is missing by lvhan028 in https://github.com/InternLM/lmdeploy/pull/753
* [Fix] Rollback the data type of `input_ids` to `TYPE_UINT32` in preprocessor's proto by lvhan028 in https://github.com/InternLM/lmdeploy/pull/758
* fix turbomind build on sm<80 by grimoire in https://github.com/InternLM/lmdeploy/pull/754
* Fix early-exit condition in attention kernel by lzhangzz in https://github.com/InternLM/lmdeploy/pull/788
* Fix missed arguments when benchmark static inference performance by lvhan028 in https://github.com/InternLM/lmdeploy/pull/787
* fix extra colon in InternLMChat7B template by C1rN09 in https://github.com/InternLM/lmdeploy/pull/796
* Fix local kv head num by lvhan028 in https://github.com/InternLM/lmdeploy/pull/806
* Fix out-of-bound access by lzhangzz in https://github.com/InternLM/lmdeploy/pull/809
* Set smem size for repetition penalty kernel by lzhangzz in https://github.com/InternLM/lmdeploy/pull/818
* Fix cache verification by lzhangzz in https://github.com/InternLM/lmdeploy/pull/821
* fix finish_reason by AllentDan in https://github.com/InternLM/lmdeploy/pull/816
* fix turbomind awq by grimoire in https://github.com/InternLM/lmdeploy/pull/847
* Fix stop requests by await before turbomind queue.get() by AllentDan in https://github.com/InternLM/lmdeploy/pull/850
* [Fix] Fix meta tensor error by pppppM in https://github.com/InternLM/lmdeploy/pull/848
* Fix cuda reinitialization in a multiprocessing setting by grimoire in https://github.com/InternLM/lmdeploy/pull/862
* launch gradio server directly with hf model by AllentDan in https://github.com/InternLM/lmdeploy/pull/856
* fix typo by grimoire in https://github.com/InternLM/lmdeploy/pull/769
* Add chat template for Yi by AllentDan in https://github.com/InternLM/lmdeploy/pull/779
* fix api_server stop_session and end_session by AllentDan in https://github.com/InternLM/lmdeploy/pull/835
* Return the iterator after erasing it from a map by irexyc in https://github.com/InternLM/lmdeploy/pull/864

📚 Documentations
* [Docs] Update Supported Matrix by pppppM in https://github.com/InternLM/lmdeploy/pull/679
* [Docs] Update KV8 Docs by pppppM in https://github.com/InternLM/lmdeploy/pull/681
* [Doc] Update restful api doc by AllentDan in https://github.com/InternLM/lmdeploy/pull/662
* Check-in user guide about turbomind config by lvhan028 in https://github.com/InternLM/lmdeploy/pull/680
* Update benchmark user guide by lvhan028 in https://github.com/InternLM/lmdeploy/pull/763
* [Docs] Fix typo in `restful_api ` user guide by maxchiron in https://github.com/InternLM/lmdeploy/pull/858
* [Docs] Fix typo in `restful_api ` user guide by maxchiron in https://github.com/InternLM/lmdeploy/pull/859

🌐 Other
* bump version to v0.1.0a0 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/709
* bump version to 0.1.0a1 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/776
* bump version to v0.1.0a2 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/807
* bump version to v0.1.0 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/834

New Contributors
* zhouzaida made their first contribution in https://github.com/InternLM/lmdeploy/pull/715
* C1rN09 made their first contribution in https://github.com/InternLM/lmdeploy/pull/796
* maxchiron made their first contribution in https://github.com/InternLM/lmdeploy/pull/858

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.0.14...v0.1.0

0.1.0a2

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
💥 Improvements
* Unify prefill & decode passes by lzhangzz in https://github.com/InternLM/lmdeploy/pull/775
* add cuda12.1 build check ci by irexyc in https://github.com/InternLM/lmdeploy/pull/782
* auto upload cuda12.1 python pkg to release when create new tag by irexyc in https://github.com/InternLM/lmdeploy/pull/784
* Report the inference benchmark of models with different size by lvhan028 in https://github.com/InternLM/lmdeploy/pull/794
* Add chat template for Yi by AllentDan in https://github.com/InternLM/lmdeploy/pull/779
🐞 Bug fixes
* Fix early-exit condition in attention kernel by lzhangzz in https://github.com/InternLM/lmdeploy/pull/788
* Fix missed arguments when benchmark static inference performance by lvhan028 in https://github.com/InternLM/lmdeploy/pull/787
* fix extra colon in InternLMChat7B template by C1rN09 in https://github.com/InternLM/lmdeploy/pull/796
* Fix local kv head num by lvhan028 in https://github.com/InternLM/lmdeploy/pull/806
📚 Documentations
* Update benchmark user guide by lvhan028 in https://github.com/InternLM/lmdeploy/pull/763
🌐 Other
* bump version to v0.1.0a2 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/807

New Contributors
* C1rN09 made their first contribution in https://github.com/InternLM/lmdeploy/pull/796

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.1.0a1...v0.1.0a2

0.1.0a1

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
💥 Improvements
* Set the default value of `max_context_token_num` 1 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/761
* add triton server test and workflow yml by RunningLeon in https://github.com/InternLM/lmdeploy/pull/760
* improvement(build): enable ninja and gold linker by tpoisonooo in https://github.com/InternLM/lmdeploy/pull/767
* Report first-token-latency and token-latency percentiles by lvhan028 in https://github.com/InternLM/lmdeploy/pull/736
* convert model with hf repo_id by irexyc in https://github.com/InternLM/lmdeploy/pull/774
🐞 Bug fixes
* [Fix] build docker image failed since `packaging` is missing by lvhan028 in https://github.com/InternLM/lmdeploy/pull/753
* [Fix] Rollback the data type of `input_ids` to `TYPE_UINT32` in preprocessor's proto by lvhan028 in https://github.com/InternLM/lmdeploy/pull/758
* fix turbomind build on sm<80 by grimoire in https://github.com/InternLM/lmdeploy/pull/754
* fix typo by grimoire in https://github.com/InternLM/lmdeploy/pull/769
🌐 Other
* bump version to 0.1.0a1 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/776


**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.1.0a0...v0.1.0a1

0.1.0a0

<!-- Release notes generated using configuration in .github/release.yml at v0.1.0a0 -->

What's Changed
🚀 Features
* Add extra_requires to reduce dependencies by RunningLeon in https://github.com/InternLM/lmdeploy/pull/580
* TurboMind 2 by lzhangzz in https://github.com/InternLM/lmdeploy/pull/590
* Support loading hf model directly by irexyc in https://github.com/InternLM/lmdeploy/pull/685
💥 Improvements
* Fix Tokenizer encode by AllentDan in https://github.com/InternLM/lmdeploy/pull/645
* Optimize for throughput by lzhangzz in https://github.com/InternLM/lmdeploy/pull/701
* Replace mmengine with mmengine-lite by zhouzaida in https://github.com/InternLM/lmdeploy/pull/715
🐞 Bug fixes
* Fix init of batch state by lzhangzz in https://github.com/InternLM/lmdeploy/pull/682
* fix turbomind stream canceling by grimoire in https://github.com/InternLM/lmdeploy/pull/686
* [Fix] Fix load_checkpoint_in_model bug by HIT-cwh in https://github.com/InternLM/lmdeploy/pull/690
* Fix wrong eos_id and bos_id obtained through grpc api by lvhan028 in https://github.com/InternLM/lmdeploy/pull/644
* Fix cache/output length calculation by lzhangzz in https://github.com/InternLM/lmdeploy/pull/738
* [Fix] Skip empty batch by lzhangzz in https://github.com/InternLM/lmdeploy/pull/747
📚 Documentations
* [Docs] Update Supported Matrix by pppppM in https://github.com/InternLM/lmdeploy/pull/679
* [Docs] Update KV8 Docs by pppppM in https://github.com/InternLM/lmdeploy/pull/681
* [Doc] Update restful api doc by AllentDan in https://github.com/InternLM/lmdeploy/pull/662
* Check-in user guide about turbomind config by lvhan028 in https://github.com/InternLM/lmdeploy/pull/680
🌐 Other
* bump version to v0.1.0a0 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/709

New Contributors
* zhouzaida made their first contribution in https://github.com/InternLM/lmdeploy/pull/715

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.0.14...v0.1.0a0

0.0.14

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed

💥 Improvements
* Improve api_server and webui usage by AllentDan in https://github.com/InternLM/lmdeploy/pull/544
* fix: gradio gr.Button.update deprecated after 4.0.0 by hscspring in https://github.com/InternLM/lmdeploy/pull/637
* add cli to list the supported model names by RunningLeon in https://github.com/InternLM/lmdeploy/pull/639
* Refactor model conversion by irexyc in https://github.com/InternLM/lmdeploy/pull/296
* [Enchance] internlm message to prompt by Harold-lkk in https://github.com/InternLM/lmdeploy/pull/499
* update turbomind session_len with model.session_len by AllentDan in https://github.com/InternLM/lmdeploy/pull/634
* Manage session id using random int for gradio local mode by aisensiy in https://github.com/InternLM/lmdeploy/pull/553
* Add UltraCM and WizardLM chat templates by AllentDan in https://github.com/InternLM/lmdeploy/pull/599
* Add check env sub command by RunningLeon in https://github.com/InternLM/lmdeploy/pull/654
🐞 Bug fixes
* [Fix] Qwen's quantization results are abnormal & Baichuan cannot be quantized by pppppM in https://github.com/InternLM/lmdeploy/pull/605
* FIX: fix stop_session func bug by yunzhongyan0 in https://github.com/InternLM/lmdeploy/pull/578
* fix benchmark serving computation mistake by AllentDan in https://github.com/InternLM/lmdeploy/pull/630
* fix Tokenizer load error when the path of the being-converted model is not writable by irexyc in https://github.com/InternLM/lmdeploy/pull/669
* fix tokenizer_info when convert the model by irexyc in https://github.com/InternLM/lmdeploy/pull/661
🌐 Other
* bump version to v0.0.14 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/663

New Contributors
* hscspring made their first contribution in https://github.com/InternLM/lmdeploy/pull/637
* yunzhongyan0 made their first contribution in https://github.com/InternLM/lmdeploy/pull/578

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.0.13...v0.0.14

Page 6 of 8

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.