Scalellm

Latest version: v0.1.5

Safety actively analyzes 641872 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

0.1.5

Major changes
* added stream options to include usage info in response
* fix multiple gpu cuda graph capture issue

What's Changed
* feat: added include_usage into stream options for stream scenarios by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/243
* feat: added unittests for openai server by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/244
* [minor] use available memory to caculate cache_size by default. by liutongxuan in https://github.com/vectorch-ai/ScaleLLM/pull/245
* refactor: only do sampling in driver worker (rank=0) by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/247
* fix multiple devices cuda graph capture issue by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/248
* revert torch.cuda.empty_cache change by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/249
* ci: added release workflow by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/250
* fix workflow by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/251
* fix: pass in secrets for workflow calls. by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/252


**Full Changelog**: https://github.com/vectorch-ai/ScaleLLM/compare/v0.1.4...v0.1.5

0.1.4

Major changes
* Added logprobs for completion and chat apis
* Added best_of for completion and chate apis
What's Changed
* feat: added openai compatible logprobs support by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/232
* feat: added logprobs support for legacy completion api by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/233
* feat: added logprobs for grpc server by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/234
* feat: added best_of functionality for completion apis by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/236
* feat: added token_ids into sequence output for better debuggability. by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/237
* feat: added id_to_token for tokenizer to handle unfinished byte sequence, ending with "�" by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/238
* refactor: split pybind11 binding definitions into seperate files by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/239
* feat: added logprobs support for speculative decoding by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/240
* feat: added synchronization for batch inference by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/241
* feat: added '__repr__' function for scalellm package by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/242


**Full Changelog**: https://github.com/vectorch-ai/ScaleLLM/compare/v0.1.3...v0.1.4

0.1.3

Major changes
* Model arg hotfix for llama3
* Added more help functions
What's Changed
* fix: load vocab_size first then use it to decide model type for model sharing between llama3, llama2 and Yi. by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/230
* feat: added with statement support to release memory and exposed help function for tokenizer by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/231


**Full Changelog**: https://github.com/vectorch-ai/ScaleLLM/compare/v0.1.2...v0.1.3

0.1.2

Major changes
* set up github pages for docs https://docs.vectorch.com/
* set up whl repository to host published whls: https://whl.vectorch.com/
* support pip install with different versions: for example: `pip install scalellm -i https://whl.vectorch.com/cu121/torch2.3/`
* added latency and system metrics
* added initial monitoring dashboard.
* bug fix for decoder, rejection sampler, and default value for llama2

What's Changed
* ci: added workflow to publish docs to GitHub Pages by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/206
* docs: added docs skeleton by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/207
* docs: fixed source directory and added announcement by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/208
* feat: added monitoring docker compose for prometheus and grafana by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/209
* feat: Added prometheus metrics by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/210
* feat: added token related latency metrics by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/211
* fix: fix weight load issue for fused qkv and added more unittests for weight loading by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/213
* fix: use a consistent version for whl by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/214
* refactor: move setup.py to top level by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/217
* feat: carry over prompt to output for feature parity by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/218
* added missing changes for carrying over prompt by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/219
* fix: set correct default value of rope_theta for llama2 by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/223
* feat: convert pickle to safetensors for fast loading by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/224
* docs: add livehtml for docs development by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/225
* fix: use error instead of CHECK when prompt input is empty by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/226
* fix: avoid tensor convertion for converted ones. by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/228
* feat: added time_to_first_token and inter_token metrics for both stream and non-stream requests by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/227
* fix: decode ending tokens one by one to handle unfinished tokens by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/229


**Full Changelog**: https://github.com/vectorch-ai/ScaleLLM/compare/v0.1.1...v0.1.2

0.1.1

What's Changed
* [feat] added cuda 11.8 devel image to build cpp release image by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/194
* [fix] fix workflow format by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/195
* [CI] fix docker run options by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/196
* fix: make build pass with gcc-9 by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/197
* ci: bump version and build with new manylinux image (gcc-9) by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/198
* [python] added more examples and fix requirments version by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/199
* feat: moved scheduler wait logic from python into scheduler run_until_complete function by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/200
* feat: added multiple threads support for LLMHandler by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/201
* fix: use a proper epsilon to avoid division by zero error for rejection sampler by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/202
* feat: added batch support for llm handler by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/204
* ci: publish wheels to whl index repo by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/205


**Full Changelog**: https://github.com/vectorch-ai/ScaleLLM/compare/v0.1.0...v0.1.1

0.1.0

Major changes:
* Added python wrapper and published [scalellm](https://pypi.org/project/scalellm/) package to PyPI.
* Supported openai-compatible rest api server. 'python3 -m scalellm.serve.api_server'
* Install scalellm with pip: 'pip install scalellm'
* Added [examples](https://github.com/vectorch-ai/ScaleLLM/tree/main/python/scalellm/examples) for offline inference and async stream.

What's Changed
* [fix] use the pybind11 from libtorch and fix model download issue. by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/167
* [misc] upgrade torch to 2.3 and use gcc-12 by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/168
* [feat] added python rest api server skeleton by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/169
* [refactor] combine sequence and request outputs by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/170
* [feat] added python LLMEngine skeleton by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/171
* [refactor] move proto definitions into proto namespace by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/173
* [feat] implement async llm engine for python wrapper by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/172
* [refactor] consolidate handlers to share llm_handler between python rest api server and grpc server by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/174
* [python] move request handling logic into seperate file from api server by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/175
* [python] added model check for rest api by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/176
* [feat] added status handling for grpc server by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/177
* [misc] some changes to cmake file by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/180
* [kernle] change head_dim list to reduce binary size by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/181
* [CI] added base docker image for python wheel build by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/182
* [ci] build python wheels by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/183
* [CI] fix docker image issues and build wheel for different python, pytorch versions by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/184
* [fix] added manylinux support by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/185
* [fix] added cuda 11.8 support for manylinux by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/186
* [feat] added version suffix to include cuda and torch version by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/187
* [CI] Upload wheels to release as asserts by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/188
* [fix] fix extension typo for wheel publish workflow by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/189
* [python] added LLM for offline inference and stream examples for chat and complete by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/190
* [python] added requirements into package by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/191
* [Release] prepare 0.1.0 release by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/192
* [Release] added workflow to publish whls to PyPI by guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/193


**Full Changelog**: https://github.com/vectorch-ai/ScaleLLM/compare/v0.0.9...v0.1.0

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.