⚡️ OpenAI-compatible inference client!
The `InferenceClient`'s chat completion API is now fully compliant with `OpenAI` client. This means it's a drop-in replacement in your script:
diff
- from openai import OpenAI
+ from huggingface_hub import InferenceClient
- client = OpenAI(
+ client = InferenceClient(
base_url=...,
api_key=...,
)
output = client.chat.completions.create(
model="meta-llama/Meta-Llama-3-8B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Count to 10"},
],
stream=True,
max_tokens=1024,
)
for chunk in output:
print(chunk.choices[0].delta.content)
Why switching to `InferenceClient` if you already use `OpenAI` then? Because it's better integrated with HF services, such as the Serverless Inference API and Dedicated Endpoints. Check out the more detailed answer [in this HF Post](https://huggingface.co/posts/Wauplin/482171531718772#669635b62f966b95493d5aef).
For more details about OpenAI compatibility, check out this [guide's section](https://huggingface.co/docs/huggingface_hub/main/en/guides/inference#openai-compatibility).
* True OpenAI drop-in replacement by InferenceClient by Wauplin in 2384
* Promote chat_completion in inference guide by Wauplin in 2366
(other) `InferenceClient` improvements
Some new parameters have been added to the `InferenceClient`, following the latest changes in our Inference API:
- `prompt_name`, `truncate` and `normalize` in `feature_extraction`
- `model_id` and `response_format`, in `chat_completion`
- `adapter_id` in `text_generation`
- `hypothesis_template` and `multi_labels` in `zero_shot_classification`
Of course, all of those changes are also available in the `AsyncInferenceClient` async equivalent 🤗
* Support truncate and normalize in InferenceClient by Wauplin in 2270
* Add `prompt_name` to feature-extraction + update types by Wauplin in 2363
* Send model_id in ChatCompletion request by Wauplin in 2302
* improve client.zero_shot_classification() by MoritzLaurer in 2340
* [InferenceClient] Add support for `adapter_id` (text-generation) and `response_format` (chat-completion) by Wauplin in 2383
Added helpers for TGI servers:
- `get_endpoint_info` to get information about an endpoint (running model, framework, etc.). Only available on TGI/TEI-powered models.
- `health_check` to check health status of the server. Only available on TGI/TEI-powered models and only for InferenceEndpoint or local deployment. For serverless InferenceAPI, it's better to use `get_model_status`.
* Support /info and /health routes by Wauplin in 2269
Other fixes:
- `image_to_text` output type has been fixed
- use `wait-for-model` to avoid been rate limited while model is not loaded
- add `proxies` support
* Fix InferenceClient.image_to_text output value by Wauplin in 2285
* Fix always None in text_generation output by Wauplin in 2316
* Add wait-for-model header when sending request to Inference API by Wauplin in 2318
* Add proxy support on async client by noech373 in 2350
* Remove jinja tips + fix typo in chat completion docstring by Wauplin in 2368
💾 Serialization
The serialization module introduced in `v0.22.x` has been improved to become the preferred way to serialize a torch model to disk. It handles how of the box sharding and safe serialization (using `safetensors`) with subtleties to work with shared layers. This logic was previously scattered in libraries like `transformers`, `diffusers`, `accelerate` and `safetensors`. The goal of centralizing it in `huggingface_hub` is to allow any external library to safely benefit from the same naming convention, making it easier to manage for end users.
python
>>> from huggingface_hub import save_torch_model
>>> model = ... A PyTorch model
Save state dict to "path/to/folder". The model will be split into shards of 5GB each and saved as safetensors.
>>> save_torch_model(model, "path/to/folder")
Or save the state dict manually
>>> from huggingface_hub import save_torch_state_dict
>>> save_torch_state_dict(model.state_dict(), "path/to/folder")
More details in the [serialization package reference](https://huggingface.co/docs/huggingface_hub/main/en/package_reference/serialization).
* Serialization: support saving torch state dict to disk by Wauplin in 2314
* Handle shared layers in `save_torch_state_dict` + add `save_torch_model` by Wauplin in 2373
Some helpers related to serialization have been made public for reuse in external libraries:
- `get_torch_storage_id`
- `get_torch_storage_size`
* Support `max_shard_size` as string in `split_state_dict_into_shards_factory` by SunMarc in 2286
* Make get_torch_storage_id public by Wauplin in 2304
📁 HfFileSystem
The `HfFileSystem` has been improved to optimize calls, especially when listing files from a repo. This is especially useful for large datasets like [HuggingFaceFW/fineweb](https://huggingface.co/datasets/HuggingFaceFW/fineweb) for faster processing and reducing risk of being rate limited.
* [HfFileSystem] Less /paths-info calls by lhoestq in 2271
* Update token type definition and arg description in `hf_file_system.py` by lappemic in 2278
* [HfFileSystem] Faster `fs.walk()` by lhoestq in 2346
Thanks to lappemic, `HfFileSystem` methods are now properly documented. Check it out [here](https://huggingface.co/docs/huggingface_hub/main/en/package_reference/hf_file_system)!
* Document more `HfFilesyStem` Methods by lappemic in 2380
✨ HfApi & CLI improvements
Commit API
A new mechanism has been introduced to prevent empty commits if no changes have been detected. Enabled by default in `upload_file`, `upload_folder`, `create_commit` and the `huggingface-cli upload` command. There is no way to force an empty commit.
* Prevent empty commits if files did not change by Wauplin in 2389
Resource groups
Resource Groups allow organizations administrators to group related repositories together, and manage access to those repos. It is now possible to specify a resource group ID when creating a repo:
python
from huggingface_hub import create_repo
create_repo("my-secret-repo", private=True, resource_group_id="66670e5163145ca562cb1988")
* Support `resource_group_id` in `create_repo` by Wauplin in 2324
Webhooks API
[Webhooks](https://huggingface.co/docs/hub/en/webhooks) allow you to listen for new changes on specific repos or to all repos belonging to particular set of users/organizations (not just your repos, but any repo). With the Webhooks API you can create, enable, disable, delete, update, and list webhooks from a script!
python
from huggingface_hub import create_webhook
Example: Creating a webhook
webhook = create_webhook(
url="https://webhook.site/your-custom-url",
watched=[{"type": "user", "name": "your-username"}, {"type": "org", "name": "your-org-name"}],
domains=["repo", "discussion"],
secret="your-secret"
)
* [wip] Implement webhooks API by lappemic in 2209
Search API
The search API has been slightly improved. It is now possible to:
- filter datasets by tags
- filter which attributes should be returned in `model_info`/`list_models` (and similarly for datasets/Spaces). For example, you can ask the server to return `downloadsAllTime` for all models.
python
>>> from huggingface_hub import list_models
>>> for model in list_models(library="transformers", expand="downloadsAllTime", sort="downloads", limit=5):
... print(model.id, model.downloads_all_time)
MIT/ast-finetuned-audioset-10-10-0.4593 1676502301
sentence-transformers/all-MiniLM-L12-v2 115588145
sentence-transformers/all-MiniLM-L6-v2 250790748
google-bert/bert-base-uncased 1476913254
openai/clip-vit-large-patch14 590557280
* Support filtering datasets by tags by Wauplin in 2266
* Support `expand` parameter in `xxx_info` and `list_xxxs` (model/dataset/Space) by Wauplin in 2333
* Add InferenceStatus to ExpandModelProperty_T by Wauplin in 2388
* Do not mention gitalyUid in expand parameter by Wauplin in 2395
CLI
It is now possible to delete files from a repo using the command line:
Delete a folder:
bash
>>> huggingface-cli repo-files Wauplin/my-cool-model delete folder/
Files correctly deleted from repo. Commit: https://huggingface.co/Wauplin/my-cool-mo...
Use Unix-style wildcards to delete sets of files:
bash
>>> huggingface-cli repo-files Wauplin/my-cool-model delete *.txt folder/*.bin
Files correctly deleted from repo. Commit: https://huggingface.co/Wauplin/my-cool-mo...
* fix/issue 2090 : Add a `repo_files` command, with recursive deletion. by OlivierKessler01 in 2280
ModelHubMixin
The `ModelHubMixin`, allowing for quick integration of external libraries with the Hub have been updated to fix some existing bugs and ease its use. Learn how to integrate your library [from this guide](https://huggingface.co/docs/huggingface_hub/main/en/guides/integrations#a-more-complex-approach-class-inheritance).
* Don't override 'config' in model_kwargs by alexander-soare in 2274
* Support custom kwargs for model card in save_pretrained by qubvel in 2310
* ModelHubMixin: Fix attributes lost in inheritance by Wauplin in 2305
* Fix ModelHubMixin coders by gorold in 2291
* Hot-fix: do not share tags between `ModelHubMixin` siblings by Wauplin in 2394
* Fix: correctly encode/decode config in ModelHubMixin if custom coders by Wauplin in 2337
🌐 📚 Documentation
Efforts from the Korean-speaking community continued to translate guides and package references to KO! Check out the result [here](https://huggingface.co/docs/huggingface_hub/v0.23.5/ko/index).
* 🌐 [i18n-KO] Translated `package_reference/cards.md` to Korean by usr-bin-ksh in 2204
* 🌐 [i18n-KO] Translated `package_reference/community.md` to Korean by seoulsky-field in 2183
* 🌐 [i18n-KO] Translated guides/collections.md to Korean by usr-bin-ksh in 2192
* 🌐 [i18n-KO] Translated `guides/integrations.md` to Korean by cjfghk5697 in 2256
* 🌐 [i18n-KO] Translated `package_reference/environment_variables.md` to Korean by jungnerd in 2311
* 🌐 [i18n-KO] Translated `package_reference/webhooks_server.md` to Korean by fabxoe in 2344
* 🌐 [i18n-KO] Translated `guides/manage-cache.md` to Korean by cjfghk5697 in 2347
French documentation is also being updated, thanks to JibrilEl!
* [i18n-FR] Translated "Integrations" to french (sub PR of 1900) by JibrilEl in 2329
A very nice illustration has been made by severo to explain how `hf://` urls works with the HfFileSystem object. Check it out [here](https://huggingface.co/docs/huggingface_hub/main/en/guides/hf_file_system#integrations)!
* add a diagram about hf:// URLs by severo in 2358
💔 Breaking changes
A few breaking changes have been introduced:
- `ModelFilter` and `DatasetFilter` are completely removed. You can now pass arguments directly to `list_models` and `list_datasets`. This removes one level of complexity for the same result.
- remove `organization` and `name` from `update_repo_visibility`. Please use a proper `repo_id` instead. This makes the method consistent with all other methods from `HfApi`.
These breaking changes have been announced with a regular deprecation cycle.
* Bump to 0.24 + remove deprecated code by Wauplin in 2287
The `legacy_cache_layout` parameter (in `hf_hub_download`/`snapshot_download`) as well as `cached_download`, `filename_to_url` and `url_to_filename` helpers are now deprecated and will be removed in `huggingface_hub==0.26.x`. The proper way to download files is to use the current cache system with `hf_hub_download`/`snapshot_download` that have been in place for 2 years already.
* Deprecate `legacy_cache_layout` parameter in `hf_hub_download` by Wauplin in 2317
Small fixes and maintenance
⚙️ fixes
* Add comment to _send_telemetry_in_thread explaining it should not be removed by freddyaboulton in 2264
* feat: endpoints rename instances doc by co42 in 2282
* Fix FileNotFoundError in gitignore creation by Wauplin in 2288
* Close aiohttp client on error by Wauplin in 2294
* fix create_inference_endpoint by nbroad1881 in 2292
* Support custom_image in update_inference_endpoint by Wauplin in 2306
* Fix Repository if whoami call doesn't return an email by Wauplin in 2320
* print actual error message when failing to load a submodule by kallewoof in 2342
* Do not raise on `.resume()` if Inference Endpoint is already running by Wauplin in 2335
* Fix permission issue when downloading on root dir by Wauplin in 2367
* docs: update port for local doc preview in `docs/README.md` by lappemic in 2382
* Fix token=False not respected in file download by Wauplin in 2386
* Use extended path on Windows when downloading to local dir by mlinke-ai in 2378
* Add a default timeout for filelock by edevil in 2391
* Fix list_accepted_access_requests if grant user manually by Wauplin in 2392
* fix: Handle single return value. by 28Smiles in 2396
⚙️ internal
* Fix test: rename to open-llm-leaderboard + some cleaning by Wauplin in 2295
* Fix windows tests (git security update) by Wauplin in 2296
* Print correct webhook url when running in Spaces by Wauplin in 2298
* changed from --local_dir to --local-dir by rao-pathangi in 2303
* Update download badges in README by Wauplin in 2309
* Fix progress bar not always closed in file_download.py by Wauplin in 2308
* Make raises sections consistent in docstrings by Wauplin in 2313
* feat(ci): add trufflehog secrets detection by McPatate in 2321
* fix(ci): remove unnecessary permissions by McPatate in 2322
* Update _errors.py by qgallouedec in 2354
* Update ruff in CI by Wauplin in 2365
* Removing shebangs from files which are not supposed to be executable by jpodivin in 2345
* `safetensors[torch]` by qgallouedec in 2371
Significant community contributions
The following contributors have made significant changes to the library over the last release:
* usr-bin-ksh
* 🌐 [i18n-KO] Translated `package_reference/cards.md` to Korean (2204)
* 🌐 [i18n-KO] Translated guides/collections.md to Korean (2192)
* seoulsky-field
* 🌐 [i18n-KO] Translated `package_reference/community.md` to Korean (2183)
* lappemic
* Update token type definition and arg description in `hf_file_system.py` (2278)
* [wip] Implement webhooks API (2209)
* docs: update port for local doc preview in `docs/README.md` (2382)
* Document more `HfFilesyStem` Methods (2380)
* rao-pathangi
* changed from --local_dir to --local-dir (2303)
* OlivierKessler01
* fix/issue 2090 : Add a `repo_files` command, with recursive deletion. (2280)
* qubvel
* Support custom kwargs for model card in save_pretrained (2310)
* gorold
* Fix ModelHubMixin coders (2291)
* cjfghk5697
* 🌐 [i18n-KO] Translated `guides/integrations.md` to Korean (2256)
* 🌐 [i18n-KO] Translated `guides/manage-cache.md` to Korean (2347)
* kallewoof
* print actual error message when failing to load a submodule (2342)
* jungnerd
* 🌐 [i18n-KO] Translated `package_reference/environment_variables.md` to Korean (2311)
* fabxoe
* 🌐 [i18n-KO] Translated `package_reference/webhooks_server.md` to Korean (2344)
* JibrilEl
* [i18n-FR] Translated "Integrations" to french (sub PR of 1900) (2329)
* noech373
* Add proxy support on async client (2350)
* jpodivin
* Removing shebangs from files which are not supposed to be executable (2345)
* mlinke-ai
* Use extended path on Windows when downloading to local dir (2378)
* edevil
* Add a default timeout for filelock (2391)