What's New
Breaking Change: JsonInput migrating to batch API 860,953
We are officially changing JsonInput to use the batch-oriented syntax. By now(release 0.8.4), all input adapters in BentoML have migrated to this design. The main difference is that for the user-defined API function, the input parameter is now a list of JSONSerializable objects(Dict, List, Integer, Float, Str) instead of one JSONSerializable object. And the expected return value of the user-defined API function is an Iterable with the exact same length. This makes it possible for API endpoints using JsonInput adapter to take advantage of BentoML's adaptive micro-batching capability.
Here is an example of how JsonInput(formerly JsonHandler) used to work:
python
bentoml.api(input=LegacyJsonInput())
def predict(self, parsed_json):
results = self.artifacts.classifier([parsed_json['text']])
return results[0]
And here is an example with the new JsonInput class:
python
bentoml.api(input=JsonInput())
def predict(self, parsed_json_list):
texts = [j['text'] for j in parsed_json_list])
return self.artifacts.classifier(texts)
The old non-batching JsonInput is still available to help with the transition, simply use `from bentoml.adapters import LegacyJsonInput as JsonInput` to replace the JsonInput or JsonHandler in your code before BentoML 0.8.4. The `LegacyJsonInput` behaves exactly the same as JsonInput in previous releases. We will keep supporting it until BentoML version 1.0.
Custom Web UI support in API Server (839)
Custom web UI can be added to your API server now! Here is an example project: https://github.com/bentoml/gallery/tree/master/scikit-learn/iris-classifier
![bentoml custom web ui](https://raw.githubusercontent.com/bentoml/gallery/master/scikit-learn/iris-classifier/webui.png)
Add your web frontend project directory to your BentoService class and BentoML will automatically bundle all the web UI files and host them when starting the API server:
python
env(auto_pip_dependencies=True)
artifacts([SklearnModelArtifact('model')])
web_static_content('./static')
class IrisClassifier(BentoService):
api(input=DataframeInput())
def predict(self, df):
return self.artifacts.model.predict(df)
Artifact packing & loading workflow 911, 921, 949
We have refactored the Artifact API, which brings more flexibility to how users package their trained models with BentoML's API.
The most noticeable thing a user can do now is to separate model training job and BentoML model serving development - the user can now use the Artifact API to save a trained model from their training job and load it later for creating BentoService class for model serving. e.g.:
Step 1, model training:
python
from sklearn import svm
from sklearn import datasets
from bentoml.artifact import SklearnModelArtifact
if __name__ == "__main__":
Load training data
iris = datasets.load_iris()
X, y = iris.data, iris.target
Model Training
clf = svm.SVC(gamma='scale')
clf.fit(X, y)
save just the trained model with the SklearnModelArtifact to a specific directory
btml_model_artifact = SklearnModelArtifact('model')
btml_model_artifact.pack(clf)
btml_model_artifact.save('/tmp/temp_bentoml_artifact')
Step 2: Build BentoService class with the saved artifact:
python
from bentoml import env, artifacts, api, BentoService
from bentoml.adapters import DataframeInput
from bentoml.artifact import SklearnModelArtifact
env(auto_pip_dependencies=True)
artifacts([SklearnModelArtifact('model')])
class IrisClassifier(BentoService):
api(input=DataframeInput())
def predict(self, df):
Optional pre-processing, post-processing code goes here
return self.artifacts.model.predict(df)
if __name__ == "__main__":
Create a iris classifier service instance
iris_classifier_service = IrisClassifier()
load the previously saved artifact
iris_classifier_service.artifacts.get('model').load('/tmp/temp_bentoml_artifact')
saved_path = iris_classifier_service.save()
This workflow makes developing and debugging BentoService code a lot easier, user no longer needs to retrain their model every time they change something in the BentoService class definition and wants to try it out.
* Note that the old BentoService class method 'pack' has now been deprecated in this release 915
Add `bentoml containerize` command (847,884,941)
bash
$ bentoml containerize --help
Usage: bentoml containerize [OPTIONS] BENTO
Containerizes given Bento into a ready-to-use Docker image.
Options:
-p, --push
-t, --tag TEXT Optional image tag. If not specified, Bento will
generate one from the name of the Bento.
Support multiple images in the same request (828)
A new input adapter class `MultiImageInput` https://docs.bentoml.org/en/latest/api/adapters.html#multiimageinput has been added. It is designed for prediction services that require multiple image files as its input:
python
from bentoml import BentoService
import bentoml
class MyService(BentoService):
bentoml.api(input=MultiImageInput(input_names=('imageX', 'imageY')))
def predict(self, image_groups):
for image_group in image_groups:
image_array_x = image_group['imageX']
image_array_y = image_group['imageY']
Add FileInput adapter(734)
A new input adapter class `FileInput` for handling arbitrary binary files as the input for your prediction service https://github.com/bentoml/BentoML/blob/v0.8.4/bentoml/adapters/file_input.py#L33
Added Ngrok support (917)
Expose your local development model API server over a public URL endpoint, using Ngrok under the hood. To try it out, simply add the `--run-with-ngrok` flag to your `bentoml serve` CLI command, e.g.:
bash
bentoml serve IrisClassifier:latest --run-with-ngrok
Add support for CoreML (939)
Serving CoreML model on Mac OS is now supported! Users can also convert their models trained with other frameworks to the CoreML format, for better performance on Mac OS platforms. Here's an example with Pytorch model serving with CoreML and BentoML:
python
import torch
from torch import nn
class PytorchModel(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Linear(5, 1, bias=False)
torch.nn.init.ones_(self.linear.weight)
def forward(self, x):
x = self.linear(x)
return x
------
import numpy
import pandas as pd
from coremltools.models import MLModel pylint: disable=import-error
import bentoml
from bentoml.adapters import DataframeInput
from bentoml.artifact import CoreMLModelArtifact
bentoml.env(auto_pip_dependencies=True)
bentoml.artifacts([CoreMLModelArtifact('model')])
class CoreMLClassifier(bentoml.BentoService):
bentoml.api(input=DataframeInput())
def predict(self, df: pd.DataFrame) -> float:
model: MLModel = self.artifacts.model
input_data = df.to_numpy().astype(numpy.float32)
output = model.predict({"input": input_data})
return next(iter(output.values())).item()
def convert_pytorch_to_coreml(pytorch_model: PytorchModel) -> ct.models.MLModel:
"""CoreML is not for training ML models but rather for converting pretrained models
and running them on Apple devices. Therefore, in this train we convert the
pretrained PytorchModel from the tests.integration.test_pytorch_model_artifact
module into a CoreML module."""
pytorch_model.eval()
traced_pytorch_model = torch.jit.trace(pytorch_model, torch.Tensor(test_df.values))
model: MLModel = ct.convert(
traced_pytorch_model, inputs=[ct.TensorType(name="input", shape=test_df.shape)]
)
return model
------
if __name__ == '__main__':
svc = CoreMLClassifier()
pytorch_model = PytorchModel()
model = convert_pytorch_to_coreml(pytorch_model)
svc.pack('model', model)
svc.save()
Breaking Change: Remove CLI --with-conda option 898
Run inference job within an automatically generated conda environment seems like a good idea at first but we realized it introduces more problems than it solves. We are removing this option and encourage users to use docker for running inference jobs instead.
Improvements:
* 966, 968 Faster `save` by improving python local module parsing code
* 878, 879 Faster `import bentoml` with lazy module loader
* 872 Add BentoService API name validation
* 887 Set a smaller page limit for `bentoml list`
* 916 Do not cache pip requirements in Dockerfile
* 918 Improve error handling when micro batching service is unavailable
* 925 Artifact refactoring: set_dependencies method
* 932 Add warning for SavedBundle Python version mismatch
* 904 JsonInput handle AWS Lambda event should ignore content type header
* 951 Add openjdk to H2O artifact default conda dependencies
* 958 Fix typo in cli default argument help message
Bug fixes:
* 864 Fix decode headers with latin1
* 867 Fix DataFrameInput passing NaN values over HTTP JSON request
* 869 Change the default mb_max_latency value to avoid flaky micro-batching initialization
* 897 Fix yatai web client import
* 907 Fix CORS option in AWS Lambda SAM config
* 922 Fix lambda deployment when using AWS assumed-role ARN
* 959 Fix `RecursionError: maximum recursion depth exceeded` when saving BentoService bundle
* 969 Fix error in CLI command `bentoml --version`
Internal & Testing
* 870 Add docs for using BentoML's built-in benchmark client
* 855, 871, 877 Add integration tests for dockerized BentoML API server workflow
* 876, 937 Add integration test for Tensorflow SavedModel artifact
* 951 H2O artifact integration test
* 939 CoreML artifact integration test
* 865 add makefile for BentoML developers
* 868 API Server "/feedback" endpoint refactor
* 908 BentoService base class refactoring and docstring improvements
* 909 Refactor API Server startup
* 910 Refactor API server performance tracing
* 906 Fix yatai web ui startup script
* 875 Increate micro batching server test coverage
* 935 Fix list deployments error response
Community Announcements:
We have enabled __Github Discussions__ https://github.com/bentoml/BentoML/discussions feature🎉
This will be a new place for community members to connect, ask questions, and share anything related to model serving and BentoML.
Contributors
Thank you, everyone, for contributing to this amazing release loaded with new features and improvements! bojiang joshuacwnewton guy4261 Sharathmk99 co42 jackyzha0 Korusuke akainth015 omrihar yubozhao