Discuss about the release in our [Community Tab](https://huggingface.co/spaces/Wauplin/huggingface_hub/discussions/5). Feedback is welcome!! 🤗
✨ InferenceClient
Support for inference tools continues to improve in `huggingface_hub`. At the menu in this release? A new `chat_completion` API and fully typed inputs/outputs!
Chat-completion API!
A long-awaited API has just landed in `huggingface_hub`! `InferenceClient.chat_completion` follows most of OpenAI's API, making it much easier to integrate with existing tools.
Technically speaking it uses the same backend as the `text-generation` task but requires a preprocessing step to format the list of messages into a single text prompt. The chat template is rendered server-side when models are powered by [TGI](https://huggingface.co/docs/text-generation-inference/index), which is the case for most LLMs: Llama, Zephyr, Mistral, Gemma, etc. Otherwise, the templating happens client-side which requires `minijinja` package to be installed. We are actively working on bridging this gap, aiming at rendering all templates server-side in the future.
py
>>> from huggingface_hub import InferenceClient
>>> messages = [{"role": "user", "content": "What is the capital of France?"}]
>>> client = InferenceClient("HuggingFaceH4/zephyr-7b-beta")
Batch completion
>>> client.chat_completion(messages, max_tokens=100)
ChatCompletionOutput(
choices=[
ChatCompletionOutputChoice(
finish_reason='eos_token',
index=0,
message=ChatCompletionOutputChoiceMessage(
content='The capital of France is Paris. The official name of the city is "Ville de Paris" (City of Paris) and the name of the country\'s governing body, which is located in Paris, is "La République française" (The French Republic). \nI hope that helps! Let me know if you need any further information.'
)
)
],
created=1710498360
)
Stream new tokens one by one
>>> for token in client.chat_completion(messages, max_tokens=10, stream=True):
... print(token)
ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content='The', role='assistant'), index=0, finish_reason=None)], created=1710498504)
ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content=' capital', role='assistant'), index=0, finish_reason=None)], created=1710498504)
(...)
ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content=' may', role='assistant'), index=0, finish_reason=None)], created=1710498504)
ChatCompletionStreamOutput(choices=[ChatCompletionStreamOutputChoice(delta=ChatCompletionStreamOutputDelta(content=None, role=None), index=0, finish_reason='length')], created=1710498504)
* Implement `InferenceClient.chat_completion` + use new types for text-generation by Wauplin in [2094](https://github.com/huggingface/huggingface_hub/pull/2094)
* Fix InferenceClient.text_generation for non-tgi models by Wauplin in [2136](https://github.com/huggingface/huggingface_hub/pull/2136)
* https://github.com/huggingface/huggingface_hub/pull/2153 by Wauplin in [#2153](https://github.com/huggingface/huggingface_hub/pull/2153)
Inference types
We are currently working towards more consistency in tasks definitions across the Hugging Face ecosystem. This is no easy job but a major milestone has recently been achieved! All inputs and outputs of the main ML tasks are now fully specified as JSONschema objects. This is the first brick needed to have consistent expectations when running inference across our stack: transformers (Python), transformers.js (Typescript), Inference API (Python), Inference Endpoints (Python), Text Generation Inference (Rust), Text Embeddings Inference (Rust), InferenceClient (Python), Inference.js (Typescript), etc.
Integrating those definitions will require more work but `huggingface_hub` is one of the first tools to integrate them. As a start, **all `InferenceClient` return values are now typed dataclasses.** Furthermore, typed dataclasses have been generated for all tasks' inputs and outputs. This means you can now integrate them in your own library to ensure consistency with the Hugging Face ecosystem. Specifications are open-source (see [here](https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks)) meaning anyone can access and contribute to them. Python's generated classes are documented [here](https://huggingface.co/docs/huggingface_hub/main/en/package_reference/inference_types).
Here is a short example showcasing the new output types:
py
>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.object_detection("people.jpg"):
[
ObjectDetectionOutputElement(
score=0.9486683011054993,
label='person',
box=ObjectDetectionBoundingBox(xmin=59, ymin=39, xmax=420, ymax=510)
),
...
]
Note that those dataclasses are backward-compatible with the dict-based interface that was previously in use. In the example above, both `ObjectDetectionBoundingBox(...).xmin` and `ObjectDetectionBoundingBox(...)["xmin"]` are correct, even though the former should be the preferred solution from now on.
* Generate inference types + start using output types by Wauplin in [2036](https://github.com/huggingface/huggingface_hub/pull/2036)
* Add = None at optional parameters by LysandreJik in [2095](https://github.com/huggingface/huggingface_hub/pull/2095)
* Fix inference types shared between tasks by Wauplin in [2125](https://github.com/huggingface/huggingface_hub/pull/2125)
🧩 ModelHubMixin
[`ModelHubMixin`](https://huggingface.co/docs/huggingface_hub/main/en/package_reference/mixins#huggingface_hub.ModelHubMixin) is an object that can be used as a parent class for the objects in your library in order to provide built-in serialization methods to upload and download pretrained models from the Hub. This mixin is adapted into a [`PyTorchHubMixin`](https://huggingface.co/docs/huggingface_hub/main/en/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) that can serialize and deserialize any Pytorch model. The 0.22 release brings its share of improvements to these classes:
1. Better support of init values. If you instantiate a model with some custom arguments, the values will be automatically stored in a config.json file and restored when reloading the model from pretrained weights. This should unlock integrations with external libraries in a much smoother way.
2. Library authors integrating the hub mixin can now define custom metadata for their library: library name, tags, document url and repo url. These are to be defined only once when integrating the library. Any model pushed to the Hub using the library will then be easily discoverable thanks to those tags.
3. A base modelcard is generated for each saved model. This modelcard includes default tags (e.g. `model_hub_mixin`) and custom tags from the library (see 2.). You can extend/modify this modelcard by overwriting the `generate_model_card` method.
python
>>> import torch
>>> import torch.nn as nn
>>> from huggingface_hub import PyTorchModelHubMixin
Define your Pytorch model exactly the same way you are used to
>>> class MyModel(
... nn.Module,
... PyTorchModelHubMixin, multiple inheritance
... library_name="keras-nlp",
... tags=["keras"],
... repo_url="https://github.com/keras-team/keras-nlp",
... docs_url="https://keras.io/keras_nlp/",
... ^ optional metadata to generate model card
... ):
... def __init__(self, hidden_size: int = 512, vocab_size: int = 30000, output_size: int = 4):
... super().__init__()
... self.param = nn.Parameter(torch.rand(hidden_size, vocab_size))
... self.linear = nn.Linear(output_size, vocab_size)
... def forward(self, x):
... return self.linear(x + self.param)
1. Create model
>>> model = MyModel(hidden_size=128)
Config is automatically created based on input + default values
>>> model._hub_mixin_config
{"hidden_size": 128, "vocab_size": 30000, "output_size": 4}
2. (optional) Save model to local directory
>>> model.save_pretrained("path/to/my-awesome-model")
3. Push model weights to the Hub
>>> model.push_to_hub("my-awesome-model")
4. Initialize model from the Hub => config has been preserved
>>> model = MyModel.from_pretrained("username/my-awesome-model")
>>> model._hub_mixin_config
{"hidden_size": 128, "vocab_size": 30000, "output_size": 4}
Model card has been correctly populated
>>> from huggingface_hub import ModelCard
>>> card = ModelCard.load("username/my-awesome-model")
>>> card.data.tags
["keras", "pytorch_model_hub_mixin", "model_hub_mixin"]
>>> card.data.library_name
"keras-nlp"
For more details on how to integrate these classes, check out the [integration guide](https://huggingface.co/docs/huggingface_hub/main/en/guides/integrations#a-more-complex-approach-class-inheritance).
* Fix `ModelHubMixin`: pass config when `__init__` accepts **kwargs by Wauplin in [2058](https://github.com/huggingface/huggingface_hub/pull/2058)
* [PyTorchModelHubMixin] Fix saving model with shared tensors by NielsRogge in [2086](https://github.com/huggingface/huggingface_hub/pull/2086)
* Correctly inject config in `PytorchModelHubMixin` by Wauplin in [2079](https://github.com/huggingface/huggingface_hub/pull/2079)
* Fix passing kwargs in PytorchHubMixin by Wauplin in [2093](https://github.com/huggingface/huggingface_hub/pull/2093)
* Generate modelcard in `ModelHubMixin` by Wauplin in [2080](https://github.com/huggingface/huggingface_hub/pull/2080)
* Fix ModelHubMixin: save config only if doesn't exist by Wauplin in [2105](https://github.com/huggingface/huggingface_hub/pull/2105)
* Fix ModelHubMixin - kwargs should be passed correctly when reloading by Wauplin in [2099](https://github.com/huggingface/huggingface_hub/pull/2099)
* Fix ModelHubMixin when kwargs and config are both passed by Wauplin in [2138](https://github.com/huggingface/huggingface_hub/pull/2138)
* ModelHubMixin overwrite config if preexistant by Wauplin in [2142](https://github.com/huggingface/huggingface_hub/pull/2142)
🛠️ Misc improvements
`HfFileSystem` download speed was limited by some internal logic in `fsspec`. We've now updated the `get_file` and `read` implementations to improve their download speed to a level similar to `hf_hub_download`.
* Fast download in hf file system by Wauplin in [2143](https://github.com/huggingface/huggingface_hub/pull/2143)
We are aiming at moving all errors raised by `huggingface_hub` into a single module `huggingface_hub.errors` to ease the developer experience. This work has been started as a community contribution from Y4suyuki.
* Start defining custom errors in one place by Y4suyuki in [2122](https://github.com/huggingface/huggingface_hub/pull/2122)
`HfApi` class now accepts a `headers` parameters that is then passed to every HTTP call made to the Hub.
* Allow passing custom headers to HfApi by Wauplin in [2098](https://github.com/huggingface/huggingface_hub/pull/2098)
📚 More documentation in Korean!
* [i18n-KO] Translated `package_reference/overview.md` to Korean by jungnerd in [2113](https://github.com/huggingface/huggingface_hub/pull/2113)
💔 Breaking changes
- The new types returned by `InferenceClient` methods should be backward compatible, especially to access values either as attributes (`.my_field`) or as items (i.e. `["my_field"]`). However, dataclasses and dicts do not always behave exactly the same so might notice some breaking changes. Those breaking changes should be very limited.
- `ModelHubMixin` internals changed quite a bit, breaking *some* use cases. We don't think those use cases were in use and changing them should really benefit 99% of integrations. If you witness any inconsistency or error in your integration, please let us know and we will do our best to mitigate the problem. One of the biggest change is that the config values are not attached to the mixin instance as `instance.config` anymore but as `instance._model_hub_mixin`. The `.config` attribute has been mistakenly introduced in `0.20.x` so we hope it has not been used much yet.
- `huggingface_hub.file_download.http_user_agent` has been removed in favor of the officially document `huggingface_hub.utils.build_hf_headers`. It was a deprecated method since `0.18.x`.
Small fixes and maintenance
⚙️ CI optimization
The CI pipeline has been greatly improved, especially thanks to the efforts from bmuskalla. Most tests are now passing in under 3 minutes, against 8 to 10 minutes previously. Some long-running tests have been greatly simplified and all tests are now ran in parallel with `python-xdist`, thanks to a complete decorrelation between them.
We are now also using the great [`uv`](https://github.com/astral-sh/uv) installer instead of `pip` in our CI, which saves around 30-40s per pipeline.
* More optimized tests by Wauplin in [2054](https://github.com/huggingface/huggingface_hub/pull/2054)
* Enable `python-xdist` on all tests by bmuskalla in [2059](https://github.com/huggingface/huggingface_hub/pull/2059)
* do not list all models by Wauplin in [2061](https://github.com/huggingface/huggingface_hub/pull/2061)
* update ruff by Wauplin in [2071](https://github.com/huggingface/huggingface_hub/pull/2071)
* Use uv in CI to speed-up requirements install by Wauplin in [2072](https://github.com/huggingface/huggingface_hub/pull/2072)
⚙️ fixes
* Fix Space variable when updatedAt is missing by Wauplin in [2050](https://github.com/huggingface/huggingface_hub/pull/2050)
* Fix tests involving temp directory on macOS by bmuskalla in [2052](https://github.com/huggingface/huggingface_hub/pull/2052)
* fix glob no magic by lhoestq in [2056](https://github.com/huggingface/huggingface_hub/pull/2056)
* Point out that the token must have write scope by bmuskalla in [2053](https://github.com/huggingface/huggingface_hub/pull/2053)
* Fix commonpath in read-only filesystem by stevelaskaridis in [2073](https://github.com/huggingface/huggingface_hub/pull/2073)
* rm unnecessary early makedirs by poedator in [2092](https://github.com/huggingface/huggingface_hub/pull/2092)
* Fix unhandled filelock issue by Wauplin in [2108](https://github.com/huggingface/huggingface_hub/pull/2108)
* Handle .DS_Store files in _scan_cache_repos by sealad886 in [2112](https://github.com/huggingface/huggingface_hub/pull/2112)
* Fix REPO_API_REGEX by Wauplin in [2119](https://github.com/huggingface/huggingface_hub/pull/2119)
* Fix uploading to HF proxy by Wauplin in [2120](https://github.com/huggingface/huggingface_hub/pull/2120)
* Fix --delete in huggingface-cli upload command by Wauplin in [2129](https://github.com/huggingface/huggingface_hub/pull/2129)
* Explicitly fail on Keras3 by Wauplin in [2107](https://github.com/huggingface/huggingface_hub/pull/2107)
* Fix serverless naming by Wauplin in [2137](https://github.com/huggingface/huggingface_hub/pull/2137)
⚙️ internal
* tag as 0.22.0.dev + remove deprecated code by Wauplin in [2049](https://github.com/huggingface/huggingface_hub/pull/2049)
* Some cleaning by Wauplin in [2070](https://github.com/huggingface/huggingface_hub/pull/2070)
* Fix test test_delete_branch_on_missing_branch_fails by Wauplin in [2088](https://github.com/huggingface/huggingface_hub/pull/2088)
Significant community contributions
The following contributors have made significant changes to the library over the last release:
* Y4suyuki
* Start defining custom errors in one place ([2122](https://github.com/huggingface/huggingface_hub/pull/2122))
* bmuskalla
* Enable `python-xdist` on all tests by bmuskalla in [2059](https://github.com/huggingface/huggingface_hub/pull/2059)