Bentoml

Latest version: v1.2.19

Safety actively analyzes 641082 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 7 of 21

1.1.5

What's Changed
* fix(type): explicit init for attrs Runner by aarnphm in https://github.com/bentoml/BentoML/pull/4140
* fix: typo in ALLOWED_CUDA_VERSION_ARGS by thomasjo in https://github.com/bentoml/BentoML/pull/4156
* chore(deps): open Starlette version, to allow latest by alexeyshockov in https://github.com/bentoml/BentoML/pull/4100
* chore: lower bound for cloudpickle by aarnphm in https://github.com/bentoml/BentoML/pull/4098
* docs: Add embedded runners docs by Sherlock113 in https://github.com/bentoml/BentoML/pull/4157
* fix cloud client types by sauyon in https://github.com/bentoml/BentoML/pull/4160
* fix: use closer-integrated callbackwrapper by sauyon in https://github.com/bentoml/BentoML/pull/4161
* chore(annotations): cleanup compat and fix ModelSignatureDict type by aarnphm in https://github.com/bentoml/BentoML/pull/4162
* fix(pull): correct use `cloud_context` for models pull by aarnphm in https://github.com/bentoml/BentoML/pull/4163

New Contributors
* thomasjo made their first contribution in https://github.com/bentoml/BentoML/pull/4156
* alexeyshockov made their first contribution in https://github.com/bentoml/BentoML/pull/4100

**Full Changelog**: https://github.com/bentoml/BentoML/compare/v1.1.4...v1.1.5

1.1.4

🍱 To better support LLM serving through response streaming, we are proud to introduce an experimental support of server-sent events (SSE) streaming support in this release of BentoML `v1.14` and OpenLLM `v0.2.27`. See an example [service definition](https://gist.github.com/ssheng/38e59e475f3ac5b0f9299c71f7dc3185) for SSE streaming with Llama2.

- Added response streaming through SSE to the `bentoml.io.Text` IO Descriptor type.
- Added async generator support to both API Server and Runner to `yield` incremental text responses.
- Added supported to ☁️ BentoCloud to natively support SSE streaming.

🦾 OpenLLM added token streaming capabilities to support streaming responses from LLMs.

- Added `/v1/generate_stream` endpoint for streaming responses from LLMs.

bash
curl -N -X 'POST' 'http://0.0.0.0:3000/v1/generate_stream' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{
"prompt": " Instruction:\n What is the definition of time (200 words essay)?\n\n Response:",
"llm_config": {
"use_llama2_prompt": false,
"max_new_tokens": 4096,
"early_stopping": false,
"num_beams": 1,
"num_beam_groups": 1,
"use_cache": true,
"temperature": 0.89,
"top_k": 50,
"top_p": 0.76,
"typical_p": 1,
"epsilon_cutoff": 0,
"eta_cutoff": 0,
"diversity_penalty": 0,
"repetition_penalty": 1,
"encoder_repetition_penalty": 1,
"length_penalty": 1,
"no_repeat_ngram_size": 0,
"renormalize_logits": false,
"remove_invalid_values": false,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"encoder_no_repeat_ngram_size": 0,
"n": 1,
"best_of": 1,
"presence_penalty": 0.5,
"frequency_penalty": 0,
"use_beam_search": false,
"ignore_eos": false
},
"adapter_name": null
}'


What's Changed
* docs: Update the models doc by Sherlock113 in https://github.com/bentoml/BentoML/pull/4145
* docs: Add more workflows to the GitHub Actions doc by Sherlock113 in https://github.com/bentoml/BentoML/pull/4146
* docs: Add text embedding example to readme by Sherlock113 in https://github.com/bentoml/BentoML/pull/4151
* fix: bento build cache miss by xianml in https://github.com/bentoml/BentoML/pull/4153
* fix(buildx): parsing attestation on docker desktop by aarnphm in https://github.com/bentoml/BentoML/pull/4155

New Contributors
* xianml made their first contribution in https://github.com/bentoml/BentoML/pull/4153

**Full Changelog**: https://github.com/bentoml/BentoML/compare/v1.1.3...v1.1.4

1.1.2

Patch releases

BentoML now provides a new diffusers integration, `bentoml.diffusers_simple`.

This introduces two integration for `stable_diffusion` and `stable_diffusion_xl` model.

python
import bentoml

Create a Runner for a Stable Diffusion model
runner = bentoml.diffusers_simple.stable_diffusion.create_runner("CompVis/stable-diffusion-v1-4")

Create a Runner for a Stable Diffusion XL model
runner_xl = bentoml.diffusers_simple.stable_diffusion_xl.create_runner("stabilityai/stable-diffusion-xl-base-1.0")


General bug fixes and documentation improvement

What's Changed
* docs: Add the Overview and Quickstarts sections by Sherlock113 in https://github.com/bentoml/BentoML/pull/4088
* chore(type): makes ModelInfo mypy-compatible by aarnphm in https://github.com/bentoml/BentoML/pull/4094
* feat(store): update annotations by aarnphm in https://github.com/bentoml/BentoML/pull/4092
* docs: Fix some relative links by Sherlock113 in https://github.com/bentoml/BentoML/pull/4097
* docs: Add the Iris quickstart doc by Sherlock113 in https://github.com/bentoml/BentoML/pull/4096
* docs: Add the yolo quickstart by Sherlock113 in https://github.com/bentoml/BentoML/pull/4099
* docs: Code format fix by Sherlock113 in https://github.com/bentoml/BentoML/pull/4101
* fix: respect environment during `bentoml.bentos.build` by aarnphm in https://github.com/bentoml/BentoML/pull/4081
* docs: replaced deprecated save to save_model in pytorch.rst by EgShes in https://github.com/bentoml/BentoML/pull/4102
* fix: Make the install command shorter by frostming in https://github.com/bentoml/BentoML/pull/4103
* docs: Update the BentoCloud Build doc by Sherlock113 in https://github.com/bentoml/BentoML/pull/4104
* docs: Add quickstart repo link and move torch import in Yolo by Sherlock113 in https://github.com/bentoml/BentoML/pull/4106
* docs: fix typo by zhangwm404 in https://github.com/bentoml/BentoML/pull/4108
* docs: fix typo by zhangwm404 in https://github.com/bentoml/BentoML/pull/4109
* fix: calculate Pandas DataFrame batch size correctly by judahrand in https://github.com/bentoml/BentoML/pull/4110
* fix(cli): fix CLI output to BentoCloud by Haivilo in https://github.com/bentoml/BentoML/pull/4114
* Fix sklearn example docs by jianshen92 in https://github.com/bentoml/BentoML/pull/4121
* docs: Add the BentoCloud Deployment creation and update page property explanations by Sherlock113 in https://github.com/bentoml/BentoML/pull/4105
* fix: disable pyright for being too strict by frostming in https://github.com/bentoml/BentoML/pull/4113
* refactor(cli): change prompt of cloud cli to unify Yatai and BentoCloud by Haivilo in https://github.com/bentoml/BentoML/pull/4124
* fix(cli): change model to lower case by Haivilo in https://github.com/bentoml/BentoML/pull/4126
* chore(ci): remove codestyle jobs by aarnphm in https://github.com/bentoml/BentoML/pull/4125
* fix: don't pass column names twice by judahrand in https://github.com/bentoml/BentoML/pull/4120
* feat: SSE (Experimental) by jianshen92 in https://github.com/bentoml/BentoML/pull/4083
* docs: Restructure the get started section in BentoCloud docs by Sherlock113 in https://github.com/bentoml/BentoML/pull/4129
* docs: change monitoring image by Haivilo in https://github.com/bentoml/BentoML/pull/4133
* feat: Rust gRPC client by aarnphm in https://github.com/bentoml/BentoML/pull/3368
* feature(framework): diffusers lora and textual inversion support by larme in https://github.com/bentoml/BentoML/pull/4086
* feat(buildx): support for attestation and sbom with buildx by aarnphm in https://github.com/bentoml/BentoML/pull/4132

New Contributors
* EgShes made their first contribution in https://github.com/bentoml/BentoML/pull/4102
* zhangwm404 made their first contribution in https://github.com/bentoml/BentoML/pull/4108

**Full Changelog**: https://github.com/bentoml/BentoML/compare/v1.1.1...v1.1.2

1.1.1

- Added more extensive cloud config option for `bentoml deployment` CLI, Thanks Haivilo.
Note that `bentoml deployment update` now takes the name as a optional positional argument instead of the previous behaviour `--name`:
bash
bentoml deployment update DEPLOYMENT_NAME

See 4087
- Added documentation about bento release GitHub action, Thanks frostming. See 4071

**Full Changelog**: https://github.com/bentoml/BentoML/compare/v1.1.0...v1.1.1

1.1.0

🍱 We're thrilled to announce the release of BentoML v1.1.0, our first minor version update since the milestone v1.0.

- **Backward Compatibility**: Rest assured that this release maintains full API backward compatibility with v1.0.
- **Official gRPC Support**: We've transitioned [gRPC support in BentoML](https://docs.bentoml.org/en/latest/guides/grpc.html) from experimental to official status, expanding your toolkit for high-performance, low-latency services.
- **Ray Integration**: Ray is a popular open-source compute framework that makes it easy to scale Python workloads. [BentoML integrates natively with Ray Serve](https://docs.bentoml.org/en/latest/integrations/ray.html) to enable users to deploy Bento applications in a Ray cluster without modifying code or configuration.
- **Enhanced Hugging Face Transformers and Diffusers Support:** All Hugging Face Diffuser models and pipelines can be seamlessly imported and integrated into BentoML applications through the [Transformers](https://docs.bentoml.org/en/latest/frameworks/transformers.html) and [Diffusers](https://docs.bentoml.org/en/latest/frameworks/diffusers.html) framework libraries.
- **Enhanced Model Version Management**: Enjoy greater flexibility with the [improved model version management](https://docs.bentoml.org/en/latest/concepts/bento.html#models), enabling flexible configuration and synchronization of model versions with your remote model store.

🦾 We are also excited to announce the launch of OpenLLM v0.2.0 featuring the support of [Llama 2](https://ai.meta.com/llama/) models.

![image](https://github.com/bentoml/BentoML/assets/861225/91df476d-0f05-4f53-b6e8-4c1882f04d7f)

- **GPU and CPU Support:** Running Llama is support on both GPU and CPU.
- **Model variations and parameter sizes:** Support all model weights and parameter sizes on Hugging Face.

bash
meta-llama/llama-2-70b-chat-hf
meta-llama/llama-2-13b-chat-hf
meta-llama/llama-2-7b-chat-hf
meta-llama/llama-2-70b-hf
meta-llama/llama-2-13b-hf
meta-llama/llama-2-7b-hf
openlm-research/open_llama_7b_v2
openlm-research/open_llama_3b_v2
openlm-research/open_llama_13b
huggyllama/llama-65b
huggyllama/llama-30b
huggyllama/llama-13b
huggyllama/llama-7b


Users can use any weights on HuggingFace (e.g. `TheBloke/Llama-2-13B-chat-GPTQ`), custom weights from local path (e.g. `/path/to/llama-1`), or fine-tuned weights as long as it adheres to [LlamaModelForCausalLM](https://huggingface.co/docs/transformers/main/model_doc/llama2#transformers.LlamaForCausalLM).

- **Stay tuned for Fine-tuning capabilities in OpenLLM:** Fine-tuning various Llama 2 models will be added in a future release. Try the experimental script for fine-tuning Llama-2 with QLoRA under OpenLLM playground.


python -m openllm.playground.llama2_qlora --help

1.0.22

🍱 BentoML `v1.0.22` release has brought a list of well-anticipated updates.

- Added support for Pydantic 2 for better validate performance.
- Added support for CUDA 12 versions in builds and containerization.
- Introduced service lifecycle events allowing adding custom logic `on_deployment`, `on_startup`, and `on_shutdown`. States can be managed using the context `ctx` variable during the `on_startup` and `on_shutdown` events and during request serving in the API.

python
svc.on_deployment
def on_deployment():
pass

svc.on_startup
def on_startup(ctx: bentoml.Context):
ctx.state["object_key"] = create_object()

svc.on_shutdown
def on_shutdown(ctx: bentoml.Context):
cleanup_state(ctx.state["object_key"])

svc.api
def predict(input_data, ctx):
object = ctx.state["object_key"]
pass


- Added support for traffic control for both API Server and Runners. Timeout and maximum concurrency can now be configured through configuration.

bash
api_server:
traffic:
timeout: 10 API Server request timeout in seconds
max_concurrency: 32 Maximum concurrency requests in the API Server

runners:
iris:
traffic:
timeout: 10 Runner request timeout in seconds
max_concurrency: 32 Maximum concurrency requests in the Runner


- Improved performance of `bentoml push` performance for large Bentos.

🚀 One more thing, the team is delighted to unveil our latest endeavor, [OpenLLM](https://github.com/bentoml/OpenLLM). This innovative project allows you to effortless build with the state-of-the-art open source or fine-tuned Large Language Models.

- Supports all variants of Flan-T5, Dolly V2, StarCoder, Falcon, StableLM, and ChatGLM out-of-box. Fully customizable with model specific arguments.

bash
openllm start [falcon | flan_t5 | dolly_v2 | chatglm | stablelm | starcoder]


- Exposes the familiar BentoML APIs and transforms LLMs seamlessly into Runners.

bash
llm_runner = openllm.Runner("dolly-v2")


- Builds LLM application into the Bento format that can be deployed to BentoCloud or containerized into OCI images.

bash
openllm build [falcon | flan_t5 | dolly_v2 | chatglm | stablelm | starcoder]



Our dedicated team is working hard to pioneering more integrations of advanced models for our upcoming releases of OpenLLM. Stay tuned for the unfolding developments.

Page 7 of 21

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.