Lmdeploy

Latest version: v0.6.3

Safety actively analyzes 681812 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 7

0.6.3

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
πŸš€ Features
* support yarn in turbomind backend by irexyc in https://github.com/InternLM/lmdeploy/pull/2519
* add linear op on dlinfer platform by yao-fengchen in https://github.com/InternLM/lmdeploy/pull/2627
* support turbomind head_dim 64 by irexyc in https://github.com/InternLM/lmdeploy/pull/2715
* [Feature]: support LlavaForConditionalGeneration with turbomind inference by deepindeed2022 in https://github.com/InternLM/lmdeploy/pull/2710
* Support Mono-InternVL with PyTorch backend by wzk1015 in https://github.com/InternLM/lmdeploy/pull/2727
* Support Qwen2-MoE models by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2723
* Support mixtral moe AWQ quantization. by AllentDan in https://github.com/InternLM/lmdeploy/pull/2725
* Support chemvlm by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2738
* Support molmo in turbomind by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2716
πŸ’₯ Improvements
* Call cuda empty_cache to prevent OOM when quantizing model by AllentDan in https://github.com/InternLM/lmdeploy/pull/2671
* feat: support dynamic/llama3 rotary embedding in ascend graph mode by tangzhiyi11 in https://github.com/InternLM/lmdeploy/pull/2670
* Add ensure_ascii = False for json.dumps by AllentDan in https://github.com/InternLM/lmdeploy/pull/2707
* Flatten cache and add flashattention by grimoire in https://github.com/InternLM/lmdeploy/pull/2676
* Support ep, column major moe kernel. by grimoire in https://github.com/InternLM/lmdeploy/pull/2690
* Remove one of the duplicate bos tokens by AllentDan in https://github.com/InternLM/lmdeploy/pull/2708
* Check server input by irexyc in https://github.com/InternLM/lmdeploy/pull/2719
* optimize dlinfer moe by tangzhiyi11 in https://github.com/InternLM/lmdeploy/pull/2741
🐞 Bug fixes
* Support min_tokens, min_p parameters for api_server by AllentDan in https://github.com/InternLM/lmdeploy/pull/2681
* fix index error when computing ppl on long-text prompt by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2697
* Better tp exit log. by grimoire in https://github.com/InternLM/lmdeploy/pull/2677
* miss to read moe_ffn weights from converted tm model by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2698
* Fix turbomind TP by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2706
* fix decoding kernel for deepseekv2 by grimoire in https://github.com/InternLM/lmdeploy/pull/2688
* fix tp exit code for pytorch engine by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2718
* fix assert pad >= 0 failed when inter_size is not a multiple of group… by Vinkle-hzt in https://github.com/InternLM/lmdeploy/pull/2740
* fix issue that mono-internvl failed to fallback pytorch engine by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2744
* Remove use_fast=True when loading tokenizer for lite auto_awq by AllentDan in https://github.com/InternLM/lmdeploy/pull/2758
* set wrong head_dim for mistral-nemo by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2761
πŸ“š Documentations
* Update ascend readme by jinminxi104 in https://github.com/InternLM/lmdeploy/pull/2756
* fix ascend get_started.md link by CyCle1024 in https://github.com/InternLM/lmdeploy/pull/2696
* Fix llama3.2 VL vision in "Supported Modals" documents by blankanswer in https://github.com/InternLM/lmdeploy/pull/2703
🌐 Other
* [ci] support v100 dailytest by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2665
* [ci] add more testcase into evaluation and daily test by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2721
* feat: support multi cards in ascend graph mode by tangzhiyi11 in https://github.com/InternLM/lmdeploy/pull/2755
* bump version to v0.6.3 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2754

New Contributors
* blankanswer made their first contribution in https://github.com/InternLM/lmdeploy/pull/2703
* tangzhiyi11 made their first contribution in https://github.com/InternLM/lmdeploy/pull/2670
* wzk1015 made their first contribution in https://github.com/InternLM/lmdeploy/pull/2727
* Vinkle-hzt made their first contribution in https://github.com/InternLM/lmdeploy/pull/2740

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.6.2...v0.6.3

0.6.2.post1

<!-- Release notes generated using configuration in .github/release.yml at 0.6.2.post1 -->

What's Changed
Bugs
* Fix llama3.2 VL vision in "Supported Modals" documents blankanswer in 2703
* miss to read moe_ffn weights from converted tm model lvhan028 in 2698
* better tp exit log grimoire in 2677
* fix index error when computing ppl on long-text prompt lvhan028 in 2697
* Support min_tokens, min_p parameters for api_server AllentDan in 2681
* fix ascend get_started.md link CyCle1024 in 2696
* Call cuda empty_cache to prevent OOM when quantizing model AllentDan in 2671
* Fix turbomind TP for v0.6.2 by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2713
🌐 Other
* [[ci] support v100 dailytest (](https://github.com/InternLM/lmdeploy/commit/434195ea0c80b38dc2cf80c79d53a30f22b53aab)https://github.com/InternLM/lmdeploy/pull/2665[)](https://github.com/InternLM/lmdeploy/commit/434195ea0c80b38dc2cf80c79d53a30f22b53aab)
* bump version to 0.6.2.post1 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2717


**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.6.2...v0.6.2.post1

0.6.2

<!-- Release notes generated using configuration in .github/release.yml at main -->

Highlights

- PyTorch engine supports graph mode on ascend platform, doubling the inference speed
- Support llama3.2-vision models in PyTorch engine
- Support Mixtral in TurboMind engine, achieving 20+ RPS using SharedGPT dataset with 2 A100-80G GPUs


What's Changed
πŸš€ Features
* support downloading models from openmind_hub by cookieyyds in https://github.com/InternLM/lmdeploy/pull/2563
* Support pytorch engine kv int4/int8 quantization by AllentDan in https://github.com/InternLM/lmdeploy/pull/2438
* feat(ascend): support w4a16 by yao-fengchen in https://github.com/InternLM/lmdeploy/pull/2587
* [maca] add maca backend support. by Reinerzhou in https://github.com/InternLM/lmdeploy/pull/2636
* Support mllama for pytorch engine by AllentDan in https://github.com/InternLM/lmdeploy/pull/2605
* add --eager-mode to cli by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2645
* [ascend] add ascend graph mode by CyCle1024 in https://github.com/InternLM/lmdeploy/pull/2647
* MoE support for turbomind by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2621
πŸ’₯ Improvements
* [Feature] Add argument to disable FastAPI docs by mouweng in https://github.com/InternLM/lmdeploy/pull/2540
* add check for device with cap 7.x by grimoire in https://github.com/InternLM/lmdeploy/pull/2535
* Add tool role for langchain usage by AllentDan in https://github.com/InternLM/lmdeploy/pull/2558
* Fix llama3.2-1b inference error by handling tie_word_embedding by grimoire in https://github.com/InternLM/lmdeploy/pull/2568
* Add a workaround for saving internvl2 with latest transformers by AllentDan in https://github.com/InternLM/lmdeploy/pull/2583
* optimize paged attention on triton3 by grimoire in https://github.com/InternLM/lmdeploy/pull/2553
* refactor for multi backends in dlinfer by CyCle1024 in https://github.com/InternLM/lmdeploy/pull/2619
* Copy sglang/bench_serving.py to lmdeploy as serving benchmark script by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2620
* Add barrier to prevent TP nccl kernel waiting. by grimoire in https://github.com/InternLM/lmdeploy/pull/2607
* [ascend] refactor fused_moe on ascend platform by yao-fengchen in https://github.com/InternLM/lmdeploy/pull/2613
* [ascend] support paged_prefill_attn when batch > 1 by yao-fengchen in https://github.com/InternLM/lmdeploy/pull/2612
* Raise an error for the wrong chat template by AllentDan in https://github.com/InternLM/lmdeploy/pull/2618
* refine pre-post-process by jinminxi104 in https://github.com/InternLM/lmdeploy/pull/2632
* small block_m for sm7.x by grimoire in https://github.com/InternLM/lmdeploy/pull/2626
* update check for triton by grimoire in https://github.com/InternLM/lmdeploy/pull/2641
* Support llama3.2 LLM models in turbomind engine by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2596
* Check whether device support bfloat16 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2653
* Add warning message about `do_sample` to alert BC by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2654
* update ascend dockerfile by CyCle1024 in https://github.com/InternLM/lmdeploy/pull/2661
* fix supported model list in ascend graph mode by jinminxi104 in https://github.com/InternLM/lmdeploy/pull/2669
* remove dlinfer version by CyCle1024 in https://github.com/InternLM/lmdeploy/pull/2672
🐞 Bug fixes
* set outlines<0.1.0 by AllentDan in https://github.com/InternLM/lmdeploy/pull/2559
* fix: make exit_flag verification for ascend more general by CyCle1024 in https://github.com/InternLM/lmdeploy/pull/2588
* set capture mode thread_local by grimoire in https://github.com/InternLM/lmdeploy/pull/2560
* Add distributed context in pytorch engine to support torchrun by grimoire in https://github.com/InternLM/lmdeploy/pull/2615
* Fix error in python3.8. by Reinerzhou in https://github.com/InternLM/lmdeploy/pull/2646
* Align UT with triton fill_kv_cache_quant kernel by AllentDan in https://github.com/InternLM/lmdeploy/pull/2644
* miss device_type when checking is_bf16_supported on ascend platform by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2663
* fix syntax in Dockerfile_aarch64_ascend by CyCle1024 in https://github.com/InternLM/lmdeploy/pull/2664
* Set history_cross_kv_seqlens to 0 by default by AllentDan in https://github.com/InternLM/lmdeploy/pull/2666
* fix build error in ascend dockerfile by CyCle1024 in https://github.com/InternLM/lmdeploy/pull/2667
* bugfix: llava-hf/llava-interleave-qwen-7b-hf (2497) by deepindeed2022 in https://github.com/InternLM/lmdeploy/pull/2657
* fix inference mode error for qwen2-vl by irexyc in https://github.com/InternLM/lmdeploy/pull/2668
πŸ“š Documentations
* Add instruction for downloading models from openmind hub by cookieyyds in https://github.com/InternLM/lmdeploy/pull/2577
* Fix spacing in ascend user guide by Superskyyy in https://github.com/InternLM/lmdeploy/pull/2601
* Update get_started tutorial about deploying on ascend platform by jinminxi104 in https://github.com/InternLM/lmdeploy/pull/2655
* Update ascend get_started tutorial about installing nnal by jinminxi104 in https://github.com/InternLM/lmdeploy/pull/2662
🌐 Other
* [ci] add oc infer test in stable test by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2523
* update copyright by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2579
* [Doc]: Lock sphinx version by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2594
* [ci] use local requirements for test workflow by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2569
* [ci] add pytorch kvint testcase into function regresstion by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2584
* [ci] React dailytest workflow by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2617
* [ci] fix restful script by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2635
* [ci] add internlm2_5_7b_batch_1 into evaluation testcase by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2631
* match torch and torch_vision version by grimoire in https://github.com/InternLM/lmdeploy/pull/2649
* Bump version to v0.6.2 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2659

New Contributors
* mouweng made their first contribution in https://github.com/InternLM/lmdeploy/pull/2540
* cookieyyds made their first contribution in https://github.com/InternLM/lmdeploy/pull/2563
* Superskyyy made their first contribution in https://github.com/InternLM/lmdeploy/pull/2601
* Reinerzhou made their first contribution in https://github.com/InternLM/lmdeploy/pull/2636
* deepindeed2022 made their first contribution in https://github.com/InternLM/lmdeploy/pull/2657

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.6.1...v0.6.2

0.6.1

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
πŸš€ Features
* Support user-sepcified data type by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2473
* Support minicpm3-4b by AllentDan in https://github.com/InternLM/lmdeploy/pull/2465
* support Qwen2-VL with pytorch backend by irexyc in https://github.com/InternLM/lmdeploy/pull/2449
πŸ’₯ Improvements
* Add silu mul kernel by grimoire in https://github.com/InternLM/lmdeploy/pull/2469
* adjust schedule to improve TTFT in pytorch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/2477
* Add max_log_len option to control length of printed log by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2478
* set served model name being repo_id from hub before it is downloaded by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2494
* Improve proxy server usage by AllentDan in https://github.com/InternLM/lmdeploy/pull/2488
* CudaGraph mixin by grimoire in https://github.com/InternLM/lmdeploy/pull/2485
* pytorch engine add get_logits by grimoire in https://github.com/InternLM/lmdeploy/pull/2487
* Refactor lora by grimoire in https://github.com/InternLM/lmdeploy/pull/2466
* support noaligned silu_and_mul by grimoire in https://github.com/InternLM/lmdeploy/pull/2506
* optimize performance of ascend backend's update_step_context() by calculating kv_start_indices in a new way by jiajie-yang in https://github.com/InternLM/lmdeploy/pull/2521
* Fix chatglm tokenizer failed when transformers>=4.45.0 by AllentDan in https://github.com/InternLM/lmdeploy/pull/2520
🐞 Bug fixes
* Fix "TypeError: Got unsupported ScalarType BFloat16" by SeitaroShinagawa in https://github.com/InternLM/lmdeploy/pull/2472
* fix ascend atten_mask by yao-fengchen in https://github.com/InternLM/lmdeploy/pull/2483
* Catch exceptions thrown by turbomind inference thread by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2502
* The `get_ppl` missed the last token of each iteration during multi-iter prefill by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2499
* fix vl gradio by irexyc in https://github.com/InternLM/lmdeploy/pull/2527
🌐 Other
* [ci] regular update by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2431
* [CI] add base model evaluation by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2490
* bump version to v0.6.1 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2513

New Contributors
* SeitaroShinagawa made their first contribution in https://github.com/InternLM/lmdeploy/pull/2472

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.6.0...v0.6.1

0.6.0

<!-- Release notes generated using configuration in .github/release.yml at main -->
Highlight
- Optimize W4A16 quantized model inference by implementing GEMM in TurboMind Engine
- Add GPTQ-INT4 inference
- Support CUDA architecture from SM70 and above, equivalent to the V100 and above.
- Refactor PytorchEngine
- Employ CUDA graph to boost the inference performance (30%)
- Support more models in Huawei Ascend platform
- Upgrade `GenerationConfig`
- Support `min_p` sampling
- Add `do_sample=False` as the default option
- Remove `EngineGenerationConfig` and merge it to `GenertionConfig`
- Support guided decoding
- Distinguish between the concepts of the name of the deployed model and the name of the model's chat tempate
Before:

lmdeploy serve api_server /the/path/of/your/awesome/model \
--model-name customized_chat_template.json

After

lmdeploy serve api_server /the/path/of/your/awesome/model \
--model-name "the served model name"
--chat-template customized_chat_template.json


Break Changes
- TurboMind model converter. Please re-convert the models if you uses this feature
- `EngineGenerationConfig` is removed. Please use `GenerationConfig` instead
- Chat template. Please use `--chat-template` to specify it

What's Changed
πŸš€ Features
* support vlm custom image process parameters in openai input format by irexyc in https://github.com/InternLM/lmdeploy/pull/2245
* New GEMM kernels for weight-only quantization by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2090
* Fix hidden size and support mistral nemo by AllentDan in https://github.com/InternLM/lmdeploy/pull/2215
* Support custom logits processors by AllentDan in https://github.com/InternLM/lmdeploy/pull/2329
* support openbmb/MiniCPM-V-2_6 by irexyc in https://github.com/InternLM/lmdeploy/pull/2351
* Support phi3.5 for pytorch engine by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2361
* Add auto_gptq to lmdeploy lite by AllentDan in https://github.com/InternLM/lmdeploy/pull/2372
* build(ascend): add Dockerfile for ascend aarch64 910B by CyCle1024 in https://github.com/InternLM/lmdeploy/pull/2278
* Support guided decoding for pytorch backend by AllentDan in https://github.com/InternLM/lmdeploy/pull/1856
* support min_p sampling parameter by irexyc in https://github.com/InternLM/lmdeploy/pull/2420
* Refactor pytorch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/2104
* refactor pytorch engine(ascend) by yao-fengchen in https://github.com/InternLM/lmdeploy/pull/2440
πŸ’₯ Improvements
* Remove deprecated arguments from API and clarify model_name and chat_template_name by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1931
* Fix duplicated session_id when pipeline is used by multithreads by irexyc in https://github.com/InternLM/lmdeploy/pull/2134
* remove eviction param by grimoire in https://github.com/InternLM/lmdeploy/pull/2285
* Remove QoS serving by AllentDan in https://github.com/InternLM/lmdeploy/pull/2294
* Support send tool_calls back to internlm2 by AllentDan in https://github.com/InternLM/lmdeploy/pull/2147
* Add stream options to control usage by AllentDan in https://github.com/InternLM/lmdeploy/pull/2313
* add device type for pytorch engine in cli by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2321
* Update error status_code to raise error in openai client by AllentDan in https://github.com/InternLM/lmdeploy/pull/2333
* Change to use device instead of device-type in cli by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2337
* Add GEMM test utils by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2342
* Add environment variable to control SILU fusion by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2343
* Use single thread per model instance by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2339
* add cache to speed up docker building by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2344
* add max_prefill_token_num argument in CLI by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2345
* torch engine optimize prefill for long context by grimoire in https://github.com/InternLM/lmdeploy/pull/1962
* Refactor turbomind (1/N) by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2352
* feat(server): enable `seed` parameter for openai compatible server. by DearPlanet in https://github.com/InternLM/lmdeploy/pull/2353
* support do_sample parameter by irexyc in https://github.com/InternLM/lmdeploy/pull/2375
* refactor TurbomindModelConfig by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2364
* import dlinfer before imageencoding by jinminxi104 in https://github.com/InternLM/lmdeploy/pull/2413
* ignore *.pth when download model from model hub by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2426
* inplace logits process as default by grimoire in https://github.com/InternLM/lmdeploy/pull/2427
* handle invalid images by irexyc in https://github.com/InternLM/lmdeploy/pull/2312
* Split token_embs and lm_head weights by irexyc in https://github.com/InternLM/lmdeploy/pull/2252
* build: update ascend dockerfile by CyCle1024 in https://github.com/InternLM/lmdeploy/pull/2421
* build nccl in dockerfile for cuda11.8 by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2433
* automatically set max_batch_size according to the device when it is not specified by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2434
* rename the ascend dockerfile by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2403
* refactor ascend kernels by yao-fengchen in https://github.com/InternLM/lmdeploy/pull/2355
🐞 Bug fixes
* enable run vlm with pytorch engine in gradio by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2256
* fix side-effect: failed to update tm model config with tm engine config by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2275
* Fix internvl2 template and update docs by irexyc in https://github.com/InternLM/lmdeploy/pull/2292
* fix the issue missing dependencies in the Dockerfile and pip by ColorfulDick in https://github.com/InternLM/lmdeploy/pull/2240
* Fix the way to get "quantization_config" from model's coniguration by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2325
* fix(ascend): fix import error of pt engine in cli by CyCle1024 in https://github.com/InternLM/lmdeploy/pull/2328
* Default rope_scaling_factor of TurbomindEngineConfig to None by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2358
* Fix the logic of update engine_config to TurbomindModelConfig for both tm model and hf model by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2362
* fix cache position for pytorch engine by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2388
* Fix /v1/completions batch order wrong by AllentDan in https://github.com/InternLM/lmdeploy/pull/2395
* Fix some issues encountered by modelscope and community by irexyc in https://github.com/InternLM/lmdeploy/pull/2428
* fix llama3 rotary in pytorch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/2444
* fix tensors on different devices when deploying MiniCPM-V-2_6 with tensor parallelism by irexyc in https://github.com/InternLM/lmdeploy/pull/2454
* fix MultinomialSampling operator builder by grimoire in https://github.com/InternLM/lmdeploy/pull/2460
* Fix initialization of runtime_min_p by irexyc in https://github.com/InternLM/lmdeploy/pull/2461
* fix Windows compile error by zhyncs in https://github.com/InternLM/lmdeploy/pull/2303
* fix: follow up 2303 by zhyncs in https://github.com/InternLM/lmdeploy/pull/2307
πŸ“š Documentations
* Reorganize the user guide and update the get_started section by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2038
* cancel support baichuan2 7b awq in pytorch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/2246
* Add user guide about slora serving by AllentDan in https://github.com/InternLM/lmdeploy/pull/2084
* Reorganize the table of content of get_started by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2378
* fix get_started user guide unaccessible by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2410
* add Ascend get_started by jinminxi104 in https://github.com/InternLM/lmdeploy/pull/2417
🌐 Other
* test prtest image update by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2192
* Update python support version by wuhongsheng in https://github.com/InternLM/lmdeploy/pull/2290
* [ci] benchmark react by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2183
* bump version to v0.6.0a0 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2371
* [ci] add daily test's coverage report by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2401
* update actions/download-artifact to v4 to fix security issue by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2419
* bump version to v0.6.0 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2445

New Contributors
* wuhongsheng made their first contribution in https://github.com/InternLM/lmdeploy/pull/2290
* ColorfulDick made their first contribution in https://github.com/InternLM/lmdeploy/pull/2240
* DearPlanet made their first contribution in https://github.com/InternLM/lmdeploy/pull/2353
* jinminxi104 made their first contribution in https://github.com/InternLM/lmdeploy/pull/2413

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.5.3...v0.6.0

0.6.0a0

<!-- Release notes generated using configuration in .github/release.yml at main -->
Highlight
- Optimize W4A16 quantized model inference by implementing GEMM in TurboMind Engine
- Add GPTQ-INT4 inference
- Support CUDA architecture from SM70 and above, equivalent to the V100 and above.
- Optimize the prefilling inference stage of PyTorchEngine
- Distinguish between the concepts of the name of the deployed model and the name of the model's chat tempate

Before:
shell
lmdeploy serve api_server /the/path/of/your/awesome/model \
--model-name customized_chat_template.json

After
shell
lmdeploy serve api_server /the/path/of/your/awesome/model \
--model-name "the served model name"
--chat-template customized_chat_template.json


What's Changed
πŸš€ Features
* support vlm custom image process parameters in openai input format by irexyc in https://github.com/InternLM/lmdeploy/pull/2245
* New GEMM kernels for weight-only quantization by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2090
* Fix hidden size and support mistral nemo by AllentDan in https://github.com/InternLM/lmdeploy/pull/2215
* Support custom logits processors by AllentDan in https://github.com/InternLM/lmdeploy/pull/2329
* support openbmb/MiniCPM-V-2_6 by irexyc in https://github.com/InternLM/lmdeploy/pull/2351
* Support phi3.5 for pytorch engine by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2361
πŸ’₯ Improvements
* Remove deprecated arguments from API and clarify model_name and chat_template_name by lvhan028 in https://github.com/InternLM/lmdeploy/pull/1931
* Fix duplicated session_id when pipeline is used by multithreads by irexyc in https://github.com/InternLM/lmdeploy/pull/2134
* remove eviction param by grimoire in https://github.com/InternLM/lmdeploy/pull/2285
* Remove QoS serving by AllentDan in https://github.com/InternLM/lmdeploy/pull/2294
* Support send tool_calls back to internlm2 by AllentDan in https://github.com/InternLM/lmdeploy/pull/2147
* Add stream options to control usage by AllentDan in https://github.com/InternLM/lmdeploy/pull/2313
* add device type for pytorch engine in cli by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2321
* Update error status_code to raise error in openai client by AllentDan in https://github.com/InternLM/lmdeploy/pull/2333
* Change to use device instead of device-type in cli by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2337
* Add GEMM test utils by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2342
* Add environment variable to control SILU fusion by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2343
* Use single thread per model instance by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2339
* add cache to speed up docker building by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2344
* add max_prefill_token_num argument in CLI by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2345
* torch engine optimize prefill for long context by grimoire in https://github.com/InternLM/lmdeploy/pull/1962
* Refactor turbomind (1/N) by lzhangzz in https://github.com/InternLM/lmdeploy/pull/2352
* feat(server): enable `seed` parameter for openai compatible server. by DearPlanet in https://github.com/InternLM/lmdeploy/pull/2353
🐞 Bug fixes
* enable run vlm with pytorch engine in gradio by RunningLeon in https://github.com/InternLM/lmdeploy/pull/2256
* fix side-effect: failed to update tm model config with tm engine config by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2275
* Fix internvl2 template and update docs by irexyc in https://github.com/InternLM/lmdeploy/pull/2292
* fix the issue missing dependencies in the Dockerfile and pip by ColorfulDick in https://github.com/InternLM/lmdeploy/pull/2240
* Fix the way to get "quantization_config" from model's coniguration by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2325
* fix(ascend): fix import error of pt engine in cli by CyCle1024 in https://github.com/InternLM/lmdeploy/pull/2328
* Default rope_scaling_factor of TurbomindEngineConfig to None by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2358
* Fix the logic of update engine_config to TurbomindModelConfig for both tm model and hf model by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2362
πŸ“š Documentations
* Reorganize the user guide and update the get_started section by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2038
* cancel support baichuan2 7b awq in pytorch engine by grimoire in https://github.com/InternLM/lmdeploy/pull/2246
* Add user guide about slora serving by AllentDan in https://github.com/InternLM/lmdeploy/pull/2084
🌐 Other
* test prtest image update by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2192
* Update python support version by wuhongsheng in https://github.com/InternLM/lmdeploy/pull/2290
* fix Windows compile error by zhyncs in https://github.com/InternLM/lmdeploy/pull/2303
* fix: follow up 2303 by zhyncs in https://github.com/InternLM/lmdeploy/pull/2307
* [ci] benchmark react by zhulinJulia24 in https://github.com/InternLM/lmdeploy/pull/2183
* bump version to v0.6.0a0 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/2371

New Contributors
* wuhongsheng made their first contribution in https://github.com/InternLM/lmdeploy/pull/2290
* ColorfulDick made their first contribution in https://github.com/InternLM/lmdeploy/pull/2240
* DearPlanet made their first contribution in https://github.com/InternLM/lmdeploy/pull/2353

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.5.3...v0.6.0a0

Page 1 of 7

Β© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.