🍱 We are excited to announce the release of BentoML v1.0.17, which includes support for 🤗 Hugging Face Transformers pre-trained instances. Prior to this release, only pipelines could be saved and loaded using the `bentoml.transformers` APIs. However, based on the community's demand to work with pre-trained models, tokenizers, preprocessors, etc., without pipelines, we have expanded our capabilities in `bentoml.transformers` APIs. With this release, all pre-trained instances can be saved and loaded into either built-in Transformers framework runners or custom runners. This update opens up new possibilities for users to work with pre-trained models, and we are thrilled to see what the community will create using this feature. To learn more, visit [BentoML Transformers framework documentation](https://docs.bentoml.org/en/latest/frameworks/transformers.html).
- Pre-trained models and instances, such as tokenizers, preprocessors, and feature extractors, can also be saved as standalone models using the `bentoml.transformers.save_model` API.
python
import bentoml
from transformers import AutoTokenizer
processor = SpeechT5Processor.from_pretrained("microsoft/speecht5_tts")
model = SpeechT5ForTextToSpeech.from_pretrained("microsoft/speecht5_tts")
vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")
bentoml.transformers.save_model("speecht5_tts_processor", processor)
bentoml.transformers.save_model("speecht5_tts_model", model, signatures={"generate_speech": {"batchable": False}})
bentoml.transformers.save_model("speecht5_tts_vocoder", vocoder)
- Pre-trained models and instances can be run either independently as Transformers framework runners or jointly in a custom runner. To use pre-trained models and instances as individual framework runners, simply get the models reference and convert them to runners using the `to_runner` method.
python
import bentoml
import torch
from bentoml.io import Text, NumpyNdarray
from datasets import load_dataset
proccessor_runner = bentoml.transformers.get("speecht5_tts_processor").to_runner()
model_runner = bentoml.transformers.get("speecht5_tts_model").to_runner()
vocoder_runner = bentoml.transformers.get("speecht5_tts_vocoder").to_runner()
embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0)
svc = bentoml.Service("text2speech", runners=[proccessor_runner, model_runner, vocoder_runner])
svc.api(input=Text(), output=NumpyNdarray())
def generate_speech(inp: str):
inputs = proccessor_runner.run(text=inp, return_tensors="pt")
speech = model_runner.generate_speech.run(input_ids=inputs["input_ids"], speaker_embeddings=speaker_embeddings, vocoder=vocoder_runner.run)
return speech.numpy()
- To use the pre-trained models and instances together in a custom runner, use the `bentoml.transformers.get` API to get the models references and load them in a custom runner. The pretrained instances can then be used for inference in the custom runner.
python
import bentoml
import torch
from datasets import load_dataset
processor_ref = bentoml.models.get("speecht5_tts_processor:latest")
model_ref = bentoml.models.get("speecht5_tts_model:latest")
vocoder_ref = bentoml.models.get("speecht5_tts_vocoder:latest")
class SpeechT5Runnable(bentoml.Runnable):
def __init__(self):
self.processor = bentoml.transformers.load_model(processor_ref)
self.model = bentoml.transformers.load_model(model_ref)
self.vocoder = bentoml.transformers.load_model(vocoder_ref)
self.embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
self.speaker_embeddings = torch.tensor(self.embeddings_dataset[7306]["xvector"]).unsqueeze(0)
bentoml.Runnable.method(batchable=False)
def generate_speech(self, inp: str):
inputs = self.processor(text=inp, return_tensors="pt")
speech = self.model.generate_speech(inputs["input_ids"], self.speaker_embeddings, vocoder=self.vocoder)
return speech.numpy()
text2speech_runner = bentoml.Runner(SpeechT5Runnable, name="speecht5_runner", models=[processor_ref, model_ref, vocoder_ref])
svc = bentoml.Service("talk_gpt", runners=[text2speech_runner])
svc.api(input=bentoml.io.Text(), output=bentoml.io.NumpyNdarray())
async def generate_speech(inp: str):
return await text2speech_runner.generate_speech.async_run(inp)
What's Changed
* feat(containerize): caching pip/conda installation layers by smidm in https://github.com/bentoml/BentoML/pull/3673
* docs(batching): update docs to 503 by sauyon in https://github.com/bentoml/BentoML/pull/3677
* chore(deps): bump ruff from 0.0.255 to 0.0.256 by dependabot in https://github.com/bentoml/BentoML/pull/3676
* fix(type): annotate PdSeries with pandas-stubs by aarnphm in https://github.com/bentoml/BentoML/pull/3466
* chore(dispatcher): refactor out training code by sauyon in https://github.com/bentoml/BentoML/pull/3663
* fix: makes containerize for triton examples to all amd64 by aarnphm in https://github.com/bentoml/BentoML/pull/3678
* chore(deps): bump coverage[toml] from 7.2.1 to 7.2.2 by dependabot in https://github.com/bentoml/BentoML/pull/3679
* revert: "chore(dispatcher): refactor out training code (3663)" by sauyon in https://github.com/bentoml/BentoML/pull/3680
* doc: add more links to Bentoml/examples by larme in https://github.com/bentoml/BentoML/pull/3631
* perf: serialization optimization by larme in https://github.com/bentoml/BentoML/pull/3606
* examples: Kubeflow by ssheng in https://github.com/bentoml/BentoML/pull/3656
* chore(deps): bump pytest-asyncio from 0.20.3 to 0.21.0 by dependabot in https://github.com/bentoml/BentoML/pull/3688
* chore(deps): bump ruff from 0.0.256 to 0.0.257 by dependabot in https://github.com/bentoml/BentoML/pull/3689
* chore(deps): bump imageio from 2.26.0 to 2.26.1 by dependabot in https://github.com/bentoml/BentoML/pull/3690
* chore(deps): bump yamllint from 1.29.0 to 1.30.0 by dependabot in https://github.com/bentoml/BentoML/pull/3694
* fix: remove duplicate dependabot check for pip by aarnphm in https://github.com/bentoml/BentoML/pull/3691
* chore(deps): bump ruff from 0.0.257 to 0.0.258 by dependabot in https://github.com/bentoml/BentoML/pull/3699
* docs: Update the Kubeflow example by ssheng in https://github.com/bentoml/BentoML/pull/3703
* chore(deps): bump ruff from 0.0.258 to 0.0.259 by dependabot in https://github.com/bentoml/BentoML/pull/3709
* docs: add link to pyfilesystem plugins by sauyon in https://github.com/bentoml/BentoML/pull/3716
* docs: Kubeflow integration documentation by ssheng in https://github.com/bentoml/BentoML/pull/3704
* docs: replace load_runner() to get().to_runner() by KimSoungRyoul in https://github.com/bentoml/BentoML/pull/3715
* chore(deps): bump imageio from 2.26.1 to 2.27.0 by dependabot in https://github.com/bentoml/BentoML/pull/3720
* fix(readme): format markdown table by aarnphm in https://github.com/bentoml/BentoML/pull/3722
* fix: copy files before running `setup_script` by aarnphm in https://github.com/bentoml/BentoML/pull/3713
* chore: remove experimental warning for `bentoml.metrics` by aarnphm in https://github.com/bentoml/BentoML/pull/3725
* ci: temporary disable coverage by aarnphm in https://github.com/bentoml/BentoML/pull/3726
* chore(deps): bump ruff from 0.0.259 to 0.0.260 by dependabot in https://github.com/bentoml/BentoML/pull/3734
* chore(deps): bump tritonclient[all] from 2.31.0 to 2.32.0 by dependabot in https://github.com/bentoml/BentoML/pull/3730
* fix(type): `bentoml.container.build` should accept multiple `image_tag` by pmayd in https://github.com/bentoml/BentoML/pull/3719
* chore(deps): bump bufbuild/buf-setup-action from 1.15.1 to 1.16.0 by dependabot in https://github.com/bentoml/BentoML/pull/3738
* feat: add query params to request context by sauyon in https://github.com/bentoml/BentoML/pull/3717
* chore(dispatcher): use attr class instead of a tuple by sauyon in https://github.com/bentoml/BentoML/pull/3731
* fix: Make it so the configured max_batch_size is respected when batching inference requests together by RShang97 in https://github.com/bentoml/BentoML/pull/3741
* feat(transformers): pretrained protocol support by aarnphm in https://github.com/bentoml/BentoML/pull/3684
* fix(tests): broken CI by aarnphm in https://github.com/bentoml/BentoML/pull/3742
* chore(deps): bump ruff from 0.0.260 to 0.0.261 by dependabot in https://github.com/bentoml/BentoML/pull/3744
* docs: Transformers documentation on pre-trained instances support by ssheng in https://github.com/bentoml/BentoML/pull/3745
New Contributors
* smidm made their first contribution in https://github.com/bentoml/BentoML/pull/3673
* pmayd made their first contribution in https://github.com/bentoml/BentoML/pull/3719
* RShang97 made their first contribution in https://github.com/bentoml/BentoML/pull/3741
**Full Changelog**: https://github.com/bentoml/BentoML/compare/v1.0.16...v1.0.17