🍱 BentoML `v1.0.8` is released with a list of improvement we hope that you’ll find useful.
- Introduced Bento Client for easy access to the BentoML service over HTTP. Both sync and async calls are supported. See the [Bento Client Guide](https://bentoml--3154.org.readthedocs.build/en/3154/guides/client.html) for more details.
python
from bentoml.client import Client
client = Client.from_url("http://localhost:3000")
Sync call
response = client.classify(np.array([[4.9, 3.0, 1.4, 0.2]]))
Async call
response = await client.async_classify(np.array([[4.9, 3.0, 1.4, 0.2]]))
- Introduced custom metrics support for easy instrumentation of custom metrics over Prometheus. See [Metrics Guide](https://bentoml--3154.org.readthedocs.build/en/3154/guides/metrics.html) for more details.
python
Histogram metric
inference_duration = bentoml.metrics.Histogram(
name="inference_duration",
documentation="Duration of inference",
labelnames=["nltk_version", "sentiment_cls"],
)
Counter metric
polarity_counter = bentoml.metrics.Counter(
name="polarity_total",
documentation="Count total number of analysis by polarity scores",
labelnames=["polarity"],
)
Full Prometheus style syntax is supported for instrumenting custom metrics inside API and Runner definitions.
python
Histogram
inference_duration.labels(
nltk_version=nltk.__version__, sentiment_cls=self.sia.__class__.__name__
).observe(time.perf_counter() - start)
Counter
polarity_counter.labels(polarity=is_positive).inc()
- Improved health checking to also cover the status of runners to avoid returning a healthy status before runners are ready.
- Added SSL/TLS support to gRPC serving.
bash
bentoml serve-grpc --ssl-certfile=credentials/cert.pem --ssl-keyfile=credentials/key.pem --production --enable-reflection
- Added channelz support for easy debugging gRPC serving.
- Allowed nested requirements with the `-r` syntax.
bash
requirements.txt
-r nested/requirements.txt
pydantic
Pillow
fastapi
- Improved the [adaptive batching](https://docs.bentoml.org/en/latest/guides/batching.html) dispatcher auto-tuning ability to avoid sporadic request failures due to batching in the beginning of the runner lifecycle.
- Fixed a bug such that runners will raise a `TypeError` when overloaded. Now an `HTTP 503 Service Unavailable` will be returned when runner is overloaded.
prolog
File "python3.9/site-packages/bentoml/_internal/runner/runner_handle/remote.py", line 188, in async_run_method
return tuple(AutoContainer.from_payload(payload) for payload in payloads)
TypeError: 'Response' object is not iterable
💡 We continue to update the documentation and examples on every release to help the community unlock the full power of BentoML.
- Check out the updated [PyTorch Framework Guide](https://docs.bentoml.org/en/latest/frameworks/pytorch.html#saving-a-trained-model) on how to use `external_modules` to save classes or utility functions required by the model.
- See the [Metrics Guide](https://docs.bentoml.org/en/latest/guides/metrics.html) on how to add custom metrics to your API and custom Runners.
- Learn more about how to use the [Bento Client](https://docs.bentoml.org/en/latest/guides/client.html) to call your BentoML service with Python easily.
- Check out the latest blog post on [why model serving over gRPC matters to data scientists](https://modelserving.com/blog/3-reasons-for-grpc).
🥂 We’d like to thank the community for your continued support and engagement.
- Shout out to judahrand for multiple contributions to BentoML and bentoctl.
- Shout out to phildamore-phdata, quandollar, 2JooYeon, and fortunto2 for their first contribution to BentoML.