Bentoml

Latest version: v1.2.18

Safety actively analyzes 638884 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 7 of 21

1.1.4

🍱 To better support LLM serving through response streaming, we are proud to introduce an experimental support of server-sent events (SSE) streaming support in this release of BentoML `v1.14` and OpenLLM `v0.2.27`. See an example [service definition](https://gist.github.com/ssheng/38e59e475f3ac5b0f9299c71f7dc3185) for SSE streaming with Llama2.

- Added response streaming through SSE to the `bentoml.io.Text` IO Descriptor type.
- Added async generator support to both API Server and Runner to `yield` incremental text responses.
- Added supported to ☁️ BentoCloud to natively support SSE streaming.

🦾 OpenLLM added token streaming capabilities to support streaming responses from LLMs.

- Added `/v1/generate_stream` endpoint for streaming responses from LLMs.

bash
curl -N -X 'POST' 'http://0.0.0.0:3000/v1/generate_stream' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{
"prompt": " Instruction:\n What is the definition of time (200 words essay)?\n\n Response:",
"llm_config": {
"use_llama2_prompt": false,
"max_new_tokens": 4096,
"early_stopping": false,
"num_beams": 1,
"num_beam_groups": 1,
"use_cache": true,
"temperature": 0.89,
"top_k": 50,
"top_p": 0.76,
"typical_p": 1,
"epsilon_cutoff": 0,
"eta_cutoff": 0,
"diversity_penalty": 0,
"repetition_penalty": 1,
"encoder_repetition_penalty": 1,
"length_penalty": 1,
"no_repeat_ngram_size": 0,
"renormalize_logits": false,
"remove_invalid_values": false,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"encoder_no_repeat_ngram_size": 0,
"n": 1,
"best_of": 1,
"presence_penalty": 0.5,
"frequency_penalty": 0,
"use_beam_search": false,
"ignore_eos": false
},
"adapter_name": null
}'


What's Changed
* docs: Update the models doc by Sherlock113 in https://github.com/bentoml/BentoML/pull/4145
* docs: Add more workflows to the GitHub Actions doc by Sherlock113 in https://github.com/bentoml/BentoML/pull/4146
* docs: Add text embedding example to readme by Sherlock113 in https://github.com/bentoml/BentoML/pull/4151
* fix: bento build cache miss by xianml in https://github.com/bentoml/BentoML/pull/4153
* fix(buildx): parsing attestation on docker desktop by aarnphm in https://github.com/bentoml/BentoML/pull/4155

New Contributors
* xianml made their first contribution in https://github.com/bentoml/BentoML/pull/4153

**Full Changelog**: https://github.com/bentoml/BentoML/compare/v1.1.3...v1.1.4

1.1.2

Patch releases

BentoML now provides a new diffusers integration, `bentoml.diffusers_simple`.

This introduces two integration for `stable_diffusion` and `stable_diffusion_xl` model.

python
import bentoml

Create a Runner for a Stable Diffusion model
runner = bentoml.diffusers_simple.stable_diffusion.create_runner("CompVis/stable-diffusion-v1-4")

Create a Runner for a Stable Diffusion XL model
runner_xl = bentoml.diffusers_simple.stable_diffusion_xl.create_runner("stabilityai/stable-diffusion-xl-base-1.0")


General bug fixes and documentation improvement

What's Changed
* docs: Add the Overview and Quickstarts sections by Sherlock113 in https://github.com/bentoml/BentoML/pull/4088
* chore(type): makes ModelInfo mypy-compatible by aarnphm in https://github.com/bentoml/BentoML/pull/4094
* feat(store): update annotations by aarnphm in https://github.com/bentoml/BentoML/pull/4092
* docs: Fix some relative links by Sherlock113 in https://github.com/bentoml/BentoML/pull/4097
* docs: Add the Iris quickstart doc by Sherlock113 in https://github.com/bentoml/BentoML/pull/4096
* docs: Add the yolo quickstart by Sherlock113 in https://github.com/bentoml/BentoML/pull/4099
* docs: Code format fix by Sherlock113 in https://github.com/bentoml/BentoML/pull/4101
* fix: respect environment during `bentoml.bentos.build` by aarnphm in https://github.com/bentoml/BentoML/pull/4081
* docs: replaced deprecated save to save_model in pytorch.rst by EgShes in https://github.com/bentoml/BentoML/pull/4102
* fix: Make the install command shorter by frostming in https://github.com/bentoml/BentoML/pull/4103
* docs: Update the BentoCloud Build doc by Sherlock113 in https://github.com/bentoml/BentoML/pull/4104
* docs: Add quickstart repo link and move torch import in Yolo by Sherlock113 in https://github.com/bentoml/BentoML/pull/4106
* docs: fix typo by zhangwm404 in https://github.com/bentoml/BentoML/pull/4108
* docs: fix typo by zhangwm404 in https://github.com/bentoml/BentoML/pull/4109
* fix: calculate Pandas DataFrame batch size correctly by judahrand in https://github.com/bentoml/BentoML/pull/4110
* fix(cli): fix CLI output to BentoCloud by Haivilo in https://github.com/bentoml/BentoML/pull/4114
* Fix sklearn example docs by jianshen92 in https://github.com/bentoml/BentoML/pull/4121
* docs: Add the BentoCloud Deployment creation and update page property explanations by Sherlock113 in https://github.com/bentoml/BentoML/pull/4105
* fix: disable pyright for being too strict by frostming in https://github.com/bentoml/BentoML/pull/4113
* refactor(cli): change prompt of cloud cli to unify Yatai and BentoCloud by Haivilo in https://github.com/bentoml/BentoML/pull/4124
* fix(cli): change model to lower case by Haivilo in https://github.com/bentoml/BentoML/pull/4126
* chore(ci): remove codestyle jobs by aarnphm in https://github.com/bentoml/BentoML/pull/4125
* fix: don't pass column names twice by judahrand in https://github.com/bentoml/BentoML/pull/4120
* feat: SSE (Experimental) by jianshen92 in https://github.com/bentoml/BentoML/pull/4083
* docs: Restructure the get started section in BentoCloud docs by Sherlock113 in https://github.com/bentoml/BentoML/pull/4129
* docs: change monitoring image by Haivilo in https://github.com/bentoml/BentoML/pull/4133
* feat: Rust gRPC client by aarnphm in https://github.com/bentoml/BentoML/pull/3368
* feature(framework): diffusers lora and textual inversion support by larme in https://github.com/bentoml/BentoML/pull/4086
* feat(buildx): support for attestation and sbom with buildx by aarnphm in https://github.com/bentoml/BentoML/pull/4132

New Contributors
* EgShes made their first contribution in https://github.com/bentoml/BentoML/pull/4102
* zhangwm404 made their first contribution in https://github.com/bentoml/BentoML/pull/4108

**Full Changelog**: https://github.com/bentoml/BentoML/compare/v1.1.1...v1.1.2

1.1.1

- Added more extensive cloud config option for `bentoml deployment` CLI, Thanks Haivilo.
Note that `bentoml deployment update` now takes the name as a optional positional argument instead of the previous behaviour `--name`:
bash
bentoml deployment update DEPLOYMENT_NAME

See 4087
- Added documentation about bento release GitHub action, Thanks frostming. See 4071

**Full Changelog**: https://github.com/bentoml/BentoML/compare/v1.1.0...v1.1.1

1.1.0

🍱 We're thrilled to announce the release of BentoML v1.1.0, our first minor version update since the milestone v1.0.

- **Backward Compatibility**: Rest assured that this release maintains full API backward compatibility with v1.0.
- **Official gRPC Support**: We've transitioned [gRPC support in BentoML](https://docs.bentoml.org/en/latest/guides/grpc.html) from experimental to official status, expanding your toolkit for high-performance, low-latency services.
- **Ray Integration**: Ray is a popular open-source compute framework that makes it easy to scale Python workloads. [BentoML integrates natively with Ray Serve](https://docs.bentoml.org/en/latest/integrations/ray.html) to enable users to deploy Bento applications in a Ray cluster without modifying code or configuration.
- **Enhanced Hugging Face Transformers and Diffusers Support:** All Hugging Face Diffuser models and pipelines can be seamlessly imported and integrated into BentoML applications through the [Transformers](https://docs.bentoml.org/en/latest/frameworks/transformers.html) and [Diffusers](https://docs.bentoml.org/en/latest/frameworks/diffusers.html) framework libraries.
- **Enhanced Model Version Management**: Enjoy greater flexibility with the [improved model version management](https://docs.bentoml.org/en/latest/concepts/bento.html#models), enabling flexible configuration and synchronization of model versions with your remote model store.

🦾 We are also excited to announce the launch of OpenLLM v0.2.0 featuring the support of [Llama 2](https://ai.meta.com/llama/) models.

![image](https://github.com/bentoml/BentoML/assets/861225/91df476d-0f05-4f53-b6e8-4c1882f04d7f)

- **GPU and CPU Support:** Running Llama is support on both GPU and CPU.
- **Model variations and parameter sizes:** Support all model weights and parameter sizes on Hugging Face.

bash
meta-llama/llama-2-70b-chat-hf
meta-llama/llama-2-13b-chat-hf
meta-llama/llama-2-7b-chat-hf
meta-llama/llama-2-70b-hf
meta-llama/llama-2-13b-hf
meta-llama/llama-2-7b-hf
openlm-research/open_llama_7b_v2
openlm-research/open_llama_3b_v2
openlm-research/open_llama_13b
huggyllama/llama-65b
huggyllama/llama-30b
huggyllama/llama-13b
huggyllama/llama-7b


Users can use any weights on HuggingFace (e.g. `TheBloke/Llama-2-13B-chat-GPTQ`), custom weights from local path (e.g. `/path/to/llama-1`), or fine-tuned weights as long as it adheres to [LlamaModelForCausalLM](https://huggingface.co/docs/transformers/main/model_doc/llama2#transformers.LlamaForCausalLM).

- **Stay tuned for Fine-tuning capabilities in OpenLLM:** Fine-tuning various Llama 2 models will be added in a future release. Try the experimental script for fine-tuning Llama-2 with QLoRA under OpenLLM playground.


python -m openllm.playground.llama2_qlora --help

1.0.22

🍱 BentoML `v1.0.22` release has brought a list of well-anticipated updates.

- Added support for Pydantic 2 for better validate performance.
- Added support for CUDA 12 versions in builds and containerization.
- Introduced service lifecycle events allowing adding custom logic `on_deployment`, `on_startup`, and `on_shutdown`. States can be managed using the context `ctx` variable during the `on_startup` and `on_shutdown` events and during request serving in the API.

python
svc.on_deployment
def on_deployment():
pass

svc.on_startup
def on_startup(ctx: bentoml.Context):
ctx.state["object_key"] = create_object()

svc.on_shutdown
def on_shutdown(ctx: bentoml.Context):
cleanup_state(ctx.state["object_key"])

svc.api
def predict(input_data, ctx):
object = ctx.state["object_key"]
pass


- Added support for traffic control for both API Server and Runners. Timeout and maximum concurrency can now be configured through configuration.

bash
api_server:
traffic:
timeout: 10 API Server request timeout in seconds
max_concurrency: 32 Maximum concurrency requests in the API Server

runners:
iris:
traffic:
timeout: 10 Runner request timeout in seconds
max_concurrency: 32 Maximum concurrency requests in the Runner


- Improved performance of `bentoml push` performance for large Bentos.

🚀 One more thing, the team is delighted to unveil our latest endeavor, [OpenLLM](https://github.com/bentoml/OpenLLM). This innovative project allows you to effortless build with the state-of-the-art open source or fine-tuned Large Language Models.

- Supports all variants of Flan-T5, Dolly V2, StarCoder, Falcon, StableLM, and ChatGLM out-of-box. Fully customizable with model specific arguments.

bash
openllm start [falcon | flan_t5 | dolly_v2 | chatglm | stablelm | starcoder]


- Exposes the familiar BentoML APIs and transforms LLMs seamlessly into Runners.

bash
llm_runner = openllm.Runner("dolly-v2")


- Builds LLM application into the Bento format that can be deployed to BentoCloud or containerized into OCI images.

bash
openllm build [falcon | flan_t5 | dolly_v2 | chatglm | stablelm | starcoder]



Our dedicated team is working hard to pioneering more integrations of advanced models for our upcoming releases of OpenLLM. Stay tuned for the unfolding developments.

1.0.20

🍱 BentoML `v1.0.20` is released with improved usability and compatibility features.

- **Production Mode by Default:** `bentoml serve` command will now run with the `--production` option by default. The change is made the simulate the production behavior during development. The `--reload` option will continue to with as expected. To achieve the serving behavior previously, use `--development` instead.
- **Optional Dependency for OpenTelemetry Exporter:** The `opentelemetry-exporter-otlp-proto-http` dependency has been moved from a required dependency to an optional one to address a `protobuf` dependency incompatibility issue. ⚠️ If you are currently using the Model Monitoring and Inference Data Collection feature, you must install the package with the `monitor-otlp` ****option from this release onwards to include the necessary dependency.

python
pip install "bentoml[monitor-otlp]"


- **OpenTelemetry Trace ID Configuration Option:** A new configuration option has been added to return the OpenTelemetry Trace ID in the response. This feature is particularly helpful when tracing has not been initialized in the upstream caller, but the caller still wishes to log the Trace ID in case of an error.

yaml
api_server:
http:
response:
trace_id: True


- Start from a Service: Added the ability to start a server from a `bentoml.Service` object. This is helpful for troubleshooting a project in a development environment where no Bentos has been built yet.

python
import bentoml

import the Service defined in `/clip_api_service/service.py` file
from clip_api_service.service import svc

if __name__ == "__main__":
start a server:
server = bentoml.HTTPServer(svc)
server.start(blocking=False)
client = server.get_client()
client.predict(..)


What's Changed
* fix(dispatcher): handling empty o_stat in `trigger_refresh` by larme in https://github.com/bentoml/BentoML/pull/3796
* fix(framework): adjust diffusers device_map default behavior by larme in https://github.com/bentoml/BentoML/pull/3779
* chore(dispatcher): cancel jobs with a for loop by sauyon in https://github.com/bentoml/BentoML/pull/3788
* fix: correctly reraise `CancelledError` by sauyon in https://github.com/bentoml/BentoML/pull/3801
* use path as resource for non-OS paths by sauyon in https://github.com/bentoml/BentoML/pull/3800
* chore(deps): bump coverage[toml] from 7.2.3 to 7.2.4 by dependabot in https://github.com/bentoml/BentoML/pull/3803
* feat: embedded runner by larme in https://github.com/bentoml/BentoML/pull/3735
* feat(tensorflow): support list types inputs by enmanuelmag in https://github.com/bentoml/BentoML/pull/3807
* chore(deps): bump ruff from 0.0.263 to 0.0.264 by dependabot in https://github.com/bentoml/BentoML/pull/3817
* feat: subprocess build by aarnphm in https://github.com/bentoml/BentoML/pull/3814
* docs: update community slack links by parano in https://github.com/bentoml/BentoML/pull/3824
* chore(deps): bump pyarrow from 11.0.0 to 12.0.0 by dependabot in https://github.com/bentoml/BentoML/pull/3820
* chore(deps): remove imageio by aarnphm in https://github.com/bentoml/BentoML/pull/3812
* chore(deps): bump tritonclient[all] from 2.32.0 to 2.33.0 by dependabot in https://github.com/bentoml/BentoML/pull/3795
* ci: add Pillow to tests dependencies by aarnphm in https://github.com/bentoml/BentoML/pull/3830
* feat(observability): support `service.name` by aarnphm in https://github.com/bentoml/BentoML/pull/3825
* feat: optional returning trace_id in response by aarnphm in https://github.com/bentoml/BentoML/pull/3827
* chore: 3.11 support by PeterJCLaw in https://github.com/bentoml/BentoML/pull/3792
* fix: Eliminate the exception during shutdown by frostming in https://github.com/bentoml/BentoML/pull/3826
* chore: expose scheduling_strategy in to_runner by bojiang in https://github.com/bentoml/BentoML/pull/3831
* feat: allow starting server with bentoml.Service instance by parano in https://github.com/bentoml/BentoML/pull/3829
* chore(deps): bump bufbuild/buf-setup-action from 1.17.0 to 1.18.0 by dependabot in https://github.com/bentoml/BentoML/pull/3838
* fix: make sure to set content-type for file type by aarnphm in https://github.com/bentoml/BentoML/pull/3837
* docs: update default docs to use env as key:value instead of list type by aarnphm in https://github.com/bentoml/BentoML/pull/3841
* deps: move exporter-proto to optional by aarnphm in https://github.com/bentoml/BentoML/pull/3840
* feat(server): improve server APIs by aarnphm in https://github.com/bentoml/BentoML/pull/3834

New Contributors
* enmanuelmag made their first contribution in https://github.com/bentoml/BentoML/pull/3807
* PeterJCLaw made their first contribution in https://github.com/bentoml/BentoML/pull/3792

**Full Changelog**: https://github.com/bentoml/BentoML/compare/v1.0.19...v1.0.20

Page 7 of 21

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.