Lmdeploy

Latest version: v0.6.3

Safety actively analyzes 681866 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 6 of 7

0.0.10

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed
💥 Improvements
* [feature] Graceful termination of background threads in LlamaV2 by akhoroshev in https://github.com/InternLM/lmdeploy/pull/458
* expose stop words and filter eoa by AllentDan in https://github.com/InternLM/lmdeploy/pull/352
🐞 Bug fixes
* Fix side effect brought by supporting codellama: `sequence_start` is always true when calling `model.get_prompt` by lvhan028 in https://github.com/InternLM/lmdeploy/pull/466
* Miss meta instruction of internlm-chat model by lvhan028 in https://github.com/InternLM/lmdeploy/pull/470
* [bug] Fix race condition by akhoroshev in https://github.com/InternLM/lmdeploy/pull/460
* Fix compatibility issues with Pydantic 2 by aisensiy in https://github.com/InternLM/lmdeploy/pull/465
* fix benchmark serving cannot use Qwen tokenizer by AllentDan in https://github.com/InternLM/lmdeploy/pull/443
* Fix memory leak by lvhan028 in https://github.com/InternLM/lmdeploy/pull/488
📚 Documentations
* Fix typo in README.md by eltociear in https://github.com/InternLM/lmdeploy/pull/462
🌐 Other
* bump version to v0.0.10 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/474

New Contributors
* eltociear made their first contribution in https://github.com/InternLM/lmdeploy/pull/462
* akhoroshev made their first contribution in https://github.com/InternLM/lmdeploy/pull/458
* aisensiy made their first contribution in https://github.com/InternLM/lmdeploy/pull/465

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.0.9...v0.0.10

0.0.9

<!-- Release notes generated using configuration in .github/release.yml at main -->

Highlight

* Support InternLM 20B, including FP16, W4A16, and W4KV8

What's Changed

🚀 Features
* Support InternLM 20B by lvhan028 in https://github.com/InternLM/lmdeploy/pull/440

💥 Improvements
* Reduce gil switching by irexyc in https://github.com/InternLM/lmdeploy/pull/407
* Profile token generation with more settings by AllentDan in https://github.com/InternLM/lmdeploy/pull/364

🐞 Bug fixes
* Fix disk space limit for building docker image by RunningLeon in https://github.com/InternLM/lmdeploy/pull/404
* more general pypi ci by irexyc in https://github.com/InternLM/lmdeploy/pull/412
* Fix build.md by pangsg in https://github.com/InternLM/lmdeploy/pull/411
* Fix memory leak by irexyc in https://github.com/InternLM/lmdeploy/pull/415
* Fix token count bug by AllentDan in https://github.com/InternLM/lmdeploy/pull/416
* [Fix] Support actual seqlen in flash-attention2 by grimoire in https://github.com/InternLM/lmdeploy/pull/418
* [Fix] output[-1] when output is empty by wangruohui in https://github.com/InternLM/lmdeploy/pull/405

🌐 Other
* rename readthedocs config file by RunningLeon in https://github.com/InternLM/lmdeploy/pull/429
* bump version to v0.0.9 by lvhan028 in https://github.com/InternLM/lmdeploy/pull/428

New Contributors
* pangsg made their first contribution in https://github.com/InternLM/lmdeploy/pull/411

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.0.8...v0.0.9

0.0.8

<!-- Release notes generated using configuration in .github/release.yml at main -->
Highlight
* Support Baichuan2-7B-Base and Baichuan2-7B-Chat
* Support all features of Code Llama: code completion, infilling, chat / instruct, and python specialist

What's Changed
🚀 Features
* Support baichuan2-chat chat template by wangruohui in https://github.com/InternLM/lmdeploy/pull/378
* Support codellama by lvhan028 in https://github.com/InternLM/lmdeploy/pull/359
🐞 Bug fixes
* [Fix] when using stream is False, continuous batching doesn't work by sleepwalker2017 in https://github.com/InternLM/lmdeploy/pull/346
* [Fix] Set max dynamic smem size for decoder MHA to support context length > 8k by lvhan028 in https://github.com/InternLM/lmdeploy/pull/377
* Fix exceed session len core dump for chat and generate by AllentDan in https://github.com/InternLM/lmdeploy/pull/366
* [Fix] update puyu model by Harold-lkk in https://github.com/InternLM/lmdeploy/pull/399

📚 Documentations
* [Docs] Fix quantization docs link by LZHgrla in https://github.com/InternLM/lmdeploy/pull/367
* [Docs] Simplify `build.md` by pppppM in https://github.com/InternLM/lmdeploy/pull/370
* [Docs] Update lmdeploy logo by lvhan028 in https://github.com/InternLM/lmdeploy/pull/372

New Contributors
* sleepwalker2017 made their first contribution in https://github.com/InternLM/lmdeploy/pull/346

**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.0.7...v0.0.8

0.0.7

<!-- Release notes generated using configuration in .github/release.yml at main -->

Highlights
* Flash attention 2 is supported, boosting context decoding speed by approximately 45%
* Token_id decoding has been optimized for better efficiency
* The gemm-tunned script has been packed in the PyPI package

What's Changed
🚀 Features
* Add flashattention2 by grimoire in https://github.com/InternLM/lmdeploy/pull/196
💥 Improvements
* add llama_gemm to wheel by irexyc in https://github.com/InternLM/lmdeploy/pull/320
* Decode generated token_ids incrementally by AllentDan in https://github.com/InternLM/lmdeploy/pull/309
🐞 Bug fixes
* Fix turbomind import error on windows by irexyc in https://github.com/InternLM/lmdeploy/pull/316
* Fix profile_serving hung issue by lvhan028 in https://github.com/InternLM/lmdeploy/pull/344
📚 Documentations
* Fix readthedocs building by RunningLeon in https://github.com/InternLM/lmdeploy/pull/321
* fix(kvint8): update doc by tpoisonooo in https://github.com/InternLM/lmdeploy/pull/315
* Update FAQ for restful api by AllentDan in https://github.com/InternLM/lmdeploy/pull/319



**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.0.6...v0.0.7

0.0.6

<!-- Release notes generated using configuration in .github/release.yml at main -->

Highlights

* Support Qwen-7B with dynamic NTK scaling and logN scaling in turbomind
* Support tensor parallelism for W4A16
* Add OpenAI-like RESTful API
* Support Llama-2 70B 4-bit quantization

What's Changed

🚀 Features
* Profiling tool for huggingface and deepspeed models by wangruohui in https://github.com/InternLM/lmdeploy/pull/161
* Support windows platform by irexyc in https://github.com/InternLM/lmdeploy/pull/209
* Qwen-7B, dynamic NTK scaling and logN scaling support in turbomind by lzhangzz in https://github.com/InternLM/lmdeploy/pull/230
* Add Restful API by AllentDan in https://github.com/InternLM/lmdeploy/pull/223
* Support context decoding with DP in pytorch by wangruohui in https://github.com/InternLM/lmdeploy/pull/193

💥 Improvements
* Support TP for W4A16 by lzhangzz in https://github.com/InternLM/lmdeploy/pull/262
* Pass chat template args including meta_prompt to model(https://github.com/InternLM/lmdeploy/commit/7785142d7c13a21bc01c2e7c0bc10b82964371f1) by AllentDan in https://github.com/InternLM/lmdeploy/pull/225
* Enable the Gradio server to call inference services through the RESTful API by AllentDan in https://github.com/InternLM/lmdeploy/pull/287

🐞 Bug fixes
* Adjust dependency of gradio server by AllentDan in https://github.com/InternLM/lmdeploy/pull/236
* Implement `movmatrix` using warp shuffling for CUDA < 11.8 by lzhangzz in https://github.com/InternLM/lmdeploy/pull/267
* Add 'accelerate' to requirement list by lvhan028 in https://github.com/InternLM/lmdeploy/pull/261
* Fix building with CUDA 11.3 by lzhangzz in https://github.com/InternLM/lmdeploy/pull/280
* Pad tok_embedding and output weights to make their shape divisible by TP by lvhan028 in https://github.com/InternLM/lmdeploy/pull/285
* Fix llama2 70b & qwen quantization error by pppppM in https://github.com/InternLM/lmdeploy/pull/273
* Import turbomind in gradio server only when it is needed by AllentDan in https://github.com/InternLM/lmdeploy/pull/303

📚 Documentations
* Remove specified version in user guide by lvhan028 in https://github.com/InternLM/lmdeploy/pull/241
* docs(quantzation): update description by tpoisonooo in https://github.com/InternLM/lmdeploy/pull/253 and https://github.com/InternLM/lmdeploy/pull/272
* Check-in FAQ by lvhan028 in https://github.com/InternLM/lmdeploy/pull/256
* add readthedocs by RunningLeon in https://github.com/InternLM/lmdeploy/pull/208

🌐 Other
* Update workflow for building docker image by RunningLeon in https://github.com/InternLM/lmdeploy/pull/282
* Change to github-hosted runner for building docker image by RunningLeon in https://github.com/InternLM/lmdeploy/pull/291

Known issues
* 4-bit Qwen-7b model inference failed. 307 is addressing this issue.


**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.0.5...v0.0.6

0.0.5

<!-- Release notes generated using configuration in .github/release.yml at main -->

What's Changed

🐞 Bug fixes
* Fix wrong RPATH using the absolute path instead of relative one by irexyc in https://github.com/InternLM/lmdeploy/pull/239


**Full Changelog**: https://github.com/InternLM/lmdeploy/compare/v0.0.4...v0.0.5

Page 6 of 7

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.