Scalellm

Latest version: v0.2.4

Safety actively analyzes 723650 Python packages for vulnerabilities to keep your Python projects secure.

Page 4 of 4

0.0.5

Major changes
* Added Qwen, ChatGLM and Phi2 support.
* Added tiktoken tokenizer support.
* Enabled more custom kernels for sampling.

What's Changed
* [docs] add speculative decoding design docs. by liutongxuan in https://github.com/vectorch-ai/ScaleLLM/pull/33
* [docs] add devel image in CONTRIBUTING.md. by liutongxuan in https://github.com/vectorch-ai/ScaleLLM/pull/35
* [refactor] rename Executor to ThreadPool. by liutongxuan in https://github.com/vectorch-ai/ScaleLLM/pull/36

New Contributors
* liutongxuan made their first contribution in https://github.com/vectorch-ai/ScaleLLM/pull/33

**Full Changelog**: https://github.com/vectorch-ai/ScaleLLM/compare/v0.0.4...v0.0.5

0.0.4

Major change:
* Added docker image build for [cuda 11.8](https://hub.docker.com/r/vectorchai/scalellm_cu118/tags).
* Added exception handling logic in http server.

**Full Changelog**: https://github.com/vectorch-ai/ScaleLLM/compare/v0.0.3-fix...v0.0.4

v0.0.3-fix
* Added support for [Yi Chat Model](https://huggingface.co/01-ai/Yi-34B-Chat).
* Added args overrider support.
* Replaced libevhtp with boost asio for http server to fix epoll_wait not implemented error on old linux kernels.

**Full Changelog**: https://github.com/vectorch-ai/ScaleLLM/compare/v0.0.2...v0.0.3-fix

0.0.2

Major changes
* Added [Yi series models](https://huggingface.co/01-ai/Yi-34B) support.
* Upgrade paged attention kernel to v2.
* Added chat templates for mistral, aquila and internlm.

What's Changed
* load dtype from config by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/14
* fixed top_k tensor type and added unittests. by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/15

**Full Changelog**: https://github.com/vectorch-ai/ScaleLLM/compare/v0.0.1...v0.0.2

0.0.1

ScaleLLM is a high-performance inference system for large language models, designed for production environments. It supports most popular open-source models, including Llama2, Bloom, GPT-NeoX, and more.

[Key Features](https://github.com/vectorch-ai/ScaleLLM#key-features)
* High Performance: ScaleLLM is optimized for high-performance LLM inference.
* Tensor Parallelism: Utilizes tensor parallelism for efficient model execution.
* OpenAI-compatible API Efficient golang rest api server that compatible with OpenAI.
* Huggingface models Integration Seamless integration with most popular HF models.
* Customizable: Offers flexibility for customization to meet your specific needs.
* Production Ready: Designed to be deployed in production environments.

<h2 tabindex="-1" id="user-content-supported-models" dir="auto" style="box-sizing: border-box; margin-top: 24px; margin-bottom: 16px; font-size: 1.5em; font-weight: var(--base-text-weight-semibold, 600); line-height: 1.25; padding-bottom: 0.3em; border-bottom: 1px solid var(--borderColor-muted, var(--color-border-muted)); color: rgb(230, 237, 243); font-family: -apple-system, "system-ui", "Segoe UI", "Noto Sans", Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji"; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(13, 17, 23); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><a class="heading-link" href="https://github.com/vectorch-ai/ScaleLLM#supported-models" style="box-sizing: border-box; background-color: transparent; color: unset; text-decoration: none; text-underline-offset: 0.2rem;">Supported Models<svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z"></path></svg></a></h2>

Models | Tensor Parallel | Quantization | HF models examples
-- | -- | -- | --
Llama2 | Yes | Yes | meta-llama/Llama-2-7b, TheBloke/Llama-2-13B-chat-GPTQ, TheBloke/Llama-2-70B-AWQ
Aquila | Yes | Yes | BAAI/Aquila-7B, BAAI/AquilaChat-7B
Bloom | Yes | Yes | bigscience/bloom
GPT_j | Yes | Yes | EleutherAI/gpt-j-6b
GPT_NeoX | Yes | -- | EleutherAI/gpt-neox-20b
GPT2 | Yes | -- | gpt2
InternLM | Yes | Yes | internlm/internlm-7b
Mistral | Yes | Yes | mistralai/Mistral-7B-v0.1
MPT | Yes | Yes | mosaicml/mpt-30b

Page 4 of 4

Releases

Has known vulnerabilities

Scalellm

Page 4 of 4

0.0.5

0.0.4

0.0.2

0.0.1

Page 4 of 4

Links

Releases