What's New
TLDR;
* New input/output adapter design that let's user choose between batch or non-batch implementation
* Speed up the API model server docker image build time
* Changed the recommended import path of artifact classes, now artifact classes should be imported from `bentoml.frameworks.*`
* Improved python pip package management
* Huggingface/Transformers support!!
* Managed packaged models with Labels API
* Support GCS(Google Cloud Storage) as model storage backend in YataiService
* Current Roadmap for feedback: https://github.com/bentoml/BentoML/discussions/1128
New Input/Output adapter design
A massive refactoring on BentoML's inference API and input/output adapter redesign, lead by bojiang with help from akainth015.
**BREAKING CHANGE:** API definition now requires declaring if it is a batch API or non-batch API:
python
from typings import List
from bentoml import env, artifacts, api, BentoService
from bentoml.adapters import JsonInput
from bentoml.types import JsonSerializable type annotations are optional
env(infer_pip_packages=True)
artifacts([SklearnModelArtifact('classifier')])
class MyPredictionService(BentoService):
api(input=JsonInput(), batch=True)
def predict_batch(self, parsed_json_list: List[JsonSerializable]):
results = self.artifacts.classifier([j['text'] for j in parsed_json_list])
return results
api(input=JsonInput()) default batch=False
def predict_non_batch(self, parsed_json: JsonSerializable):
results = self.artifacts.classifier([parsed_json['text']])
return results[0]
For APIs with `batch=True`, the user-defined API function will be required to process a list of input item at a time, and return a list of results of the same length. On the contrary, `api` by default uses `batch=False`, which processes one input item at a time. Implementing a batch API allow your workload to benefit from BentoML's adaptive micro-batching mechanism when serving online traffic, and also will speed up offline batch inference job. We recommend using `batch=True` if performance & throughput is a concern. Non-batch APIs are usually easier to implement, good for quick POC, simple use cases, and deploying on Serverless platforms such as AWS Lambda, Azure function, and Google KNative.
Read more about this change and example usage here: https://docs.bentoml.org/en/latest/api/adapters.html
**BREAKING CHANGE:** For `DataframeInput` and `TfTensorInput` users, it is now required to add `batch=True`
DataframeInput and TfTensorInput are special input types that only support accepting a batch of input at one time.
Input data validation while handling batch input
When the API function received a list of input, it is now possible to reject a subset of the input data and return an error code to the client, if the input data is invalid or malformated. Users can do this via the `InferenceTaskdiscard` API, here's an example:
python
from typings import List
from bentoml import env, artifacts, api, BentoService
from bentoml.adapters import JsonInput
from bentoml.types import JsonSerializable, InferenceTask type annotations are optional
env(infer_pip_packages=True)
artifacts([SklearnModelArtifact('classifier')])
class MyPredictionService(BentoService):
api(input=JsonInput(), batch=True)
def predict_batch(self, parsed_json_list: List[JsonSerializable], tasks: List[InferenceTask]):
model_input = []
for json, task in zip(parsed_json_list, tasks):
if "text" in json:
model_input.append(json['text'])
else:
task.discard(http_status=400, err_msg="input json must contain `text` field")
results = self.artifacts.classifier(model_input)
return results
The number of tasks got discarded plus the length of the results array returned, should be equal to the length of the input list, this will allow BentoML to match the results back to tasks that have not yet been discarded.
Allow fine-grained control of the HTTP response, CLI inference job output, etc. E.g.:
python
import bentoml
from bentoml.types import JsonSerializable, InferenceTask, InferenceError type annotations are optional
class MyService(bentoml.BentoService):
bentoml.api(input=JsonInput(), batch=False)
def predict(self, parsed_json: JsonSerializable, task: InferenceTask) -> InferenceResult:
if task.http_headers['Accept'] == "application/json":
predictions = self.artifact.model.predict([parsed_json])
return InferenceResult(
data=predictions[0],
http_status=200,
http_headers={"Content-Type": "application/json"},
)
else:
return InferenceError(err_msg="application/json output only", http_status=400)
Or when batch=True:
python
import bentoml
from bentoml.types import JsonSerializable, InferenceTask, InferenceError type annotations are optional
class MyService(bentoml.BentoService):
bentoml.api(input=JsonInput(), batch=True)
def predict(self, parsed_json_list: List[JsonSerializable], tasks: List[InferenceTask]) -> List[InferenceResult]:
rv = []
predictions = self.artifact.model.predict(parsed_json_list)
for task, prediction in zip(tasks, predictions):
if task.http_headers['Accept'] == "application/json":
rv.append(
InferenceResult(
data=prediction,
http_status=200,
http_headers={"Content-Type": "application/json"},
))
else:
rv.append(InferenceError(err_msg="application/json output only", http_status=400))
or task.discard(err_msg="application/json output only", http_status=400)
return rv
Other adapter changes:
* Added a 3 base adapters for implementing advanced adapters: FileInput, StringInput, MultiFileInput
* Implementing new adapters that support micro-batching is a lot easier now: https://github.com/bentoml/BentoML/blob/v0.9.0.pre/bentoml/adapters/base_input.py
* Per inference task prediction log 1089
* More adapters support launching batch inference job from BentoML CLI run command now, see API reference for detailed examples: https://docs.bentoml.org/en/latest/api/adapters.html
Docker Build Improvements
* Optimize docker image build time (1081) kudos to ZeyadYasser!!
* Per python minor version base image to speed up image building 1101 1096, thanks gregd33!!
* Add "latest" tag to all user-facing docker base images (1046)
Improved pip package management
Setting pip install options in BentoService `env` specification
As suggested here: https://github.com/bentoml/BentoML/issues/1036#issuecomment-682179282, Thanks danield137 for suggesting the `pip_extra_index_url` option!
python
env(
auto_pip_dependencies=True,
pip_index_url='my_pypi_host_url',
pip_trusted_host='my_pypi_host_url',
pip_extra_index_url='extra_pypi_index_url'
)
artifacts([SklearnModelArtifact('model')])
class IrisClassifier(BentoService):
...
**BREAKING CHANGE** Due to this change, we have now removed the previous docker build arg PIP_INDEX_URL and ARG PIP_TRUSTED_HOST, due to it may be conflicting with settings in base image 1036
* Support passing a conda environment.yml file to `env`, as suggested in 725 https://github.com/bentoml/BentoML/issues/725
* When a version is not specified in pip_packages list, it is expected to pin to the version found in the current python session. Now it is doing the same for packages added from adapter and artifact classes
* Support specifying package requirement range now, e.g.:
python
env(pip_packages=["abc==1.3", "foo>1.2,<=1.4"])
It can be any pip version requirement specifier https://pip.pypa.io/en/stable/reference/pip_install/#requirement-specifiers
* Renamed `pip_dependencies` to `pip_packages` and `auto_pip_dependencies` to `infer_pip_packages`, the old API still works but will eventually be deprecated.
GCS support in YataiService
Adding Google Cloud Storage (GCS) support in YataiService, as the storage backend. This is an alternative to AWS S3, MiniIO, or POSIX file system. 1017 - Thank you Korusuke PrabhanshuAttri for creating the GCS support!
YataiService Labels API for model management
Managed packaged models in YataiService with labels API implemented in 1064
1. Add labels to `BentoService.save`
python
svc = MyBentoService()
svc.save(labels={'my_key': 'my_value', 'test': 'passed'})
2. Add label query for CLI commands
* `bentoml get BENTO_NAME`, `bentoml list`, `bentoml deployment list`, `bentoml lambda list`, `bentoml sagemaker list`, `bentoml azure-functions list`
* label query supports `=`, `!=`, `In`, `NotIn`, `Exists`, `DoesNotExists` operator
- e.g. key1=value1, key2!=value2, env In (prod, staging), Key Exists, Another_key DoesNotExist
*Simple key/value label selector*
<img width="1329" alt="Screen Shot 2020-09-03 at 5 38 21 PM" src="https://user-images.githubusercontent.com/670949/92186634-4867c580-ee0c-11ea-8dc8-55c28d6a5130.png">
*Use Exists operator*
<img width="1123" alt="Screen Shot 2020-09-03 at 5 40 57 PM" src="https://user-images.githubusercontent.com/670949/92186755-a3012180-ee0c-11ea-8f68-cf30e95ba482.png">
*Use DoesNotExist operator*
<img width="1327" alt="Screen Shot 2020-09-03 at 5 41 41 PM" src="https://user-images.githubusercontent.com/670949/92186785-bc09d280-ee0c-11ea-9465-a10a8411612a.png">
*Use In operator*
<img width="1348" alt="Screen Shot 2020-09-03 at 5 48 42 PM" src="https://user-images.githubusercontent.com/670949/92187108-b6f95300-ee0d-11ea-9744-45ed182d3ab1.png">
*Use multiple label query*
<img width="1356" alt="Screen Shot 2020-09-03 at 7 07 23 PM" src="https://user-images.githubusercontent.com/670949/92191498-caf68200-ee18-11ea-9679-9f4ea06a5484.png">
3. Roadmap - add web UI for filtering and searching with labels API
New framework support: Huggingface/Transformers
1090 1094 thanks vedashree29296 for contributing this!
Usage & docs: https://docs.bentoml.org/en/stable/frameworks.html#transformers
Bug Fixes:
* Fixed 1030 - bentoml serve fails when packaged on Windows and deployed on Linux 1044
* Handle missing region during SageMaker deployment updates 1049
Internal & Testing:
* Re-organize artifacts related modules 1082, 1085
* Refactoring & improvements around dependency management 1084, 1086
* [TEST/CI] Add tests covering XgboostModelArtifact (1079)
* [TEST/CI] Fix AWS moto related unit tests (1077)
* Lock SQLAlchemy-utils version (1078)
Contributors of 0.9.0 release
Thank you all for contributing to this release!! danield137 ericmand ssakhavi aviaviavi dinakar29 umihui vedashree29296 joerg84 gregd33 mayurnewase narennadig akainth015 yubozhao bojiang