Huggingface-hub

Latest version: v0.26.2

Safety actively analyzes 682416 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 8 of 16

0.16.4

**Full Changelog**: https://github.com/huggingface/huggingface_hub/compare/v0.16.3...v0.16.4

Hotfix to avoid sharing `requests.Session` between processes. More information in https://github.com/huggingface/huggingface_hub/pull/1545. Internally, we create a Session object per thread to benefit from the HTTPSConnectionPool (i.e. do not reopen connection between calls). Due to an implementation bug, the Session object from the main thread was shared if a fork of the main process happened. The shared Session gets corrupted in the process, leading to some random ConnectionErrors in rare occasions.

Check out [these release notes](https://github.com/huggingface/huggingface_hub/releases/tag/v0.16.2) to learn more about the v0.16 release.

0.16.3

**Full Changelog**: https://github.com/huggingface/huggingface_hub/compare/v0.16.2...v0.16.3

Hotfix to print the request ID if any `RequestException` happen. This is useful to help the team debug users' problems. Request ID is a generated UUID, unique for each HTTP call made to the Hub.

Check out [these release notes](https://github.com/huggingface/huggingface_hub/releases/tag/v0.16.2) to learn more about the v0.16 release.

0.16.2

Inference

Introduced in the `v0.15` release, the `InferenceClient` got a big update in this one. The client is now reaching a stable point in terms of features. The next updates will be focused on continuing to add support for new tasks.

Async client

Asyncio calls are supported thanks to `AsyncInferenceClient`. Based on `asyncio` and `aiohttp`, it allows you to make efficient concurrent calls to the Inference endpoint of your choice. Every task supported by `InferenceClient` is supported in its async version. Method inputs and outputs and logic are strictly the same, except that you must await the coroutine.

py
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()

>>> image = await client.text_to_image("An astronaut riding a horse on the moon.")


* Support asyncio with AsyncInferenceClient by Wauplin in 1524

Text-generation

Support for text-generation task has been added. It is focused on fully supporting endpoints running on the [text-generation-inference](https://github.com/huggingface/text-generation-inference) framework. In fact, the code is heavily inspired by TGI's [Python client](https://github.com/huggingface/text-generation-inference/tree/main/clients/python) initially implemented by OlivierDehaene.

Text generation has 4 modes depending on `details` (bool) and `stream` (bool) values. By default, a raw string is returned. If `details=True`, more information about the generated tokens is returned. If `stream=True`, generated tokens are returned one by one as soon as the server generated them. For more information, [check out the documentation](https://huggingface.co/docs/huggingface_hub/main/en/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation).

py
>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()

stream=False, details=False
>>> client.text_generation("The huggingface_hub library is ", max_new_tokens=12)
'100% open source and built to be easy to use.'

stream=True, details=True
>>> for details in client.text_generation("The huggingface_hub library is ", max_new_tokens=12, details=True, stream=True):
>>> print(details)
TextGenerationStreamResponse(token=Token(id=1425, text='100', logprob=-1.0175781, special=False), generated_text=None, details=None)
...
TextGenerationStreamResponse(token=Token(
id=25,
text='.',
logprob=-0.5703125,
special=False),
generated_text='100% open source and built to be easy to use.',
details=StreamDetails(finish_reason=<FinishReason.Length: 'length'>, generated_tokens=12, seed=None)
)


Of course, the async client also supports text-generation (see [docs](https://huggingface.co/docs/huggingface_hub/main/en/package_reference/inference_client#huggingface_hub.AsyncInferenceClient.text_generation)):

py
>>> from huggingface_hub import AsyncInferenceClient
>>> client = AsyncInferenceClient()
>>> await client.text_generation("The huggingface_hub library is ", max_new_tokens=12)
'100% open source and built to be easy to use.'


* prepare for tgi by Wauplin in 1511
* Support text-generation in InferenceClient by Wauplin in 1513

Zero-shot-image-classification

`InferenceClient` now supports zero-shot-image-classification (see [docs](https://huggingface.co/docs/huggingface_hub/main/en/package_reference/inference_client#huggingface_hub.InferenceClient.zero_shot_image_classification)). Both sync and async clients support it. It allows to classify an image based on a list of labels passed as input.


py
>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()
>>> client.zero_shot_image_classification(
... "https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg",
... labels=["dog", "cat", "horse"],
... )
[{"label": "dog", "score": 0.956}, ...]


Thanks to dulayjm for your contribution on this task!

* added zero shot image classification by dulayjm in 1528

Other

When using `InferenceClient`'s task methods (text_to_image, text_generation, image_classification,...) you don't have to pass a model id. By default, the client will select a model recommended for the selected task and run on the free public Inference API. This is useful to quickly prototype and test models. In a production-ready setup, we strongly recommend to set the model id/URL manually, as the recommended model is expected to change at any time without prior notice, potentially resulting in different and unexpected results in your workflow. Recommended models are the ones used by default on https://hf.co/tasks.

* Fetch inference model for task from API by Wauplin in 1510

It is now possible to configure headers and cookies to be sent when initializing the client: `InferenceClient(headers=..., cookies=...)`. All calls made with this client will then use these headers/cookies.

* Custom headers/cookies in InferenceClient by Wauplin in 1507

Commit API

CommitScheduler

The `CommitScheduler` is a new class that can be used to regularly push commits to the Hub. It watches changes in a folder and creates a commit every 5 minutes if it detected a file change. One intended use case is to allow regular backups from a Space to a Dataset repository on the Hub. The scheduler is designed to remove the hassle of handling background commits while avoiding empty commits.

py
>>> from huggingface_hub import CommitScheduler

Schedule regular uploads every 10 minutes. Remote repo and local folder are created if they don't already exist.
>>> scheduler = CommitScheduler(
... repo_id="report-translation-feedback",
... repo_type="dataset",
... folder_path=feedback_folder,
... path_in_repo="data",
... every=10,
... )


Check out [this guide](https://huggingface.co/docs/huggingface_hub/main/en/guides/upload#scheduled-uploads) to understand how to use the `CommitScheduler`. It comes with [a Space](https://huggingface.co/spaces/Wauplin/space_to_dataset_saver) to showcase how to use it in 4 practical examples.

* `CommitScheduler`: upload folder every 5 minutes by Wauplin in 1494
* Encourage to overwrite CommitScheduler.push_to_hub by Wauplin in 1506
* FIX Use token by default in CommitScheduler by Wauplin in 1509
* safer commit scheduler by Wauplin (direct commit on main)

HFSummaryWriter (tensorboard)

The Hugging Face Hub offers nice support for Tensorboard data. It automatically detects when TensorBoard traces (such as `tfevents`) are pushed to the Hub and starts an instance to visualize them. This feature enable a quick and transparent collaboration in your team when training models. In fact, more than [42k models](https://huggingface.co/models?library=tensorboard&sort=trending) are already using this feature!

With the `HFSummaryWriter` you can now take full advantage of the feature for your training, simply by updating a single line of code.

py
>>> from huggingface_hub import HFSummaryWriter
>>> logger = HFSummaryWriter(repo_id="test_hf_logger", commit_every=15)


`HFSummaryWriter` inherits from `SummaryWriter` and acts as a drop-in replacement in your training scripts. The only addition is that every X minutes (e.g. 15 minutes) it will push the logs directory to the Hub. Commit happens in the background to avoid blocking the main thread. If the upload crashes, the logs are kept locally and the training continues.

For more information on how to use it, check out this [documentation page](https://huggingface.co/docs/huggingface_hub/main/en/package_reference/tensorboard). Please note that this is still an experimental feature so feedback is very welcome.

* Experimental hf logger by Wauplin in 1456

CommitOperationCopy

It is now possible to copy a file in a repo on the Hub. The copy can only happen within a repo and with an LFS file. File can be copied between different revisions. More information [here](https://huggingface.co/docs/huggingface_hub/main/en/package_reference/hf_api#huggingface_hub.CommitOperationCopy).

* add CommitOperationCopy by lhoestq in 1495
* Use CommitOperationCopy in hffs by Wauplin in 1497
* Batch fetch_lfs_files_to_copy by lhoestq in 1504

Breaking changes

`ModelHubMixin` got updated (after a deprecation cycle):
- Force to use kwargs instead of passing everything a positional arg
- It is not possible anymore to pass `model_id` as `username/repo_namerevision` in `ModelHubMixin`. Revision must be passed as a separate `revision` argument if needed.

* Remove deprecated code for v0.16.x by Wauplin in 1492

Bug fixes and small improvements

Doc fixes

* [doc build] Use secrets by mishig25 in 1501
* Migrate doc files to Markdown by Wauplin in 1522
* fix doc example by Wauplin (direct commit on main)
* Update readme and contributing guide by Wauplin in 1534

HTTP fixes

A `x-request-id` header is sent by default for every request made to the Hub. This should help debugging user issues.

* Add x-request-id to every request by Wauplin in 1518


3 PRs, 3 commits but in the end default timeout did not change. Problem has been solved server-side instead.
* Set 30s timeout on downloads (instead of 10s) by Wauplin in 1514
* Set timeout to 60 instead of 30 when downloading files by Wauplin in 1523
* Set timeout to 10s by ydshieh in 1530

Misc

* Rename "configs" dataset card field to "config_names" by polinaeterna in 1491
* update stats by Wauplin (direct commit on main)
* Retry on both ConnectTimeout and ReadTimeout by Wauplin in 1529
* update tip by Wauplin (direct commit on main)
* make repo_info public by Wauplin (direct commit on main)

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* dulayjm
* added zero shot image classification (1528)

0.15.1

InferenceClient

We introduce [`InferenceClient`](https://huggingface.co/docs/huggingface_hub/main/en/package_reference/inference_client#huggingface_hub.InferenceClient), a new client to run inference on the Hub. The objective is to:
- support both [InferenceAPI](https://huggingface.co/docs/api-inference/index) and [Inference Endpoints](https://huggingface.co/docs/inference-endpoints/index) services in a single client.
- offer a nice interface with:
- 1 method per task (e.g. `summary = client.summarization("this is a long text")`)
- 1 default model per task (i.e. easy to prototype)
- explicit and documented parameters
- convenient binary inputs (from url, path, file-like object,...)
- be flexible and support custom requests if needed

Check out the [Inference guide](https://huggingface.co/docs/huggingface_hub/main/en/guides/inference) to get a complete overview.

python
>>> from huggingface_hub import InferenceClient
>>> client = InferenceClient()

>>> image = client.text_to_image("An astronaut riding a horse on the moon.")
>>> image.save("astronaut.png")

>>> client.image_classification("https://upload.wikimedia.org/wikipedia/commons/thumb/4/43/Cute_dog.jpg/320px-Cute_dog.jpg")
[{'score': 0.9779096841812134, 'label': 'Blenheim spaniel'}, ...]


The short-term goal is to add support for more tasks (here is the [current list](https://huggingface.co/docs/huggingface_hub/main/en/guides/inference#supported-tasks)), especially text-generation and handle `asyncio` calls. The mid-term goal is to deprecate and replace `InferenceAPI`.

* Enhanced `InferenceClient` by Wauplin in 1474

Non-blocking uploads

It is now possible to run HfApi calls in the background! The goal is to make it easier to upload files periodically without blocking the main thread during a training. The was previously possible when using `Repository` but is now available for HTTP-based methods like `upload_file`, `upload_folder` and `create_commit`. If `run_as_future=True` is passed:
- the job is queued in a background thread. Only 1 worker is spawned to ensure no race condition. The goal is NOT to speed up a process by parallelizing concurrent calls to the Hub.
- a [`Future`](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Future) object is returned to check the job status
- main thread is not interrupted, even if an exception occurs during the upload

In addition to this parameter, a [run_as_future(...)](https://huggingface.co/docs/huggingface_hub/main/en/package_reference/hf_api#huggingface_hub.HfApi.run_as_future) method is available to queue any other calls to the Hub. More details [in this guide](https://huggingface.co/docs/huggingface_hub/main/en/guides/upload#nonblocking-upload).

py
>>> from huggingface_hub import HfApi

>>> api = HfApi()
>>> api.upload_file(...) takes Xs
URL to upload file

>>> future = api.upload_file(..., run_as_future=True) instant
>>> future.result() wait until complete
URL to upload file


* Run `HfApi` methods in the background (`run_as_future`) by Wauplin in 1458
* fix docs for run_as_future by Wauplin (direct commit on main)

Breaking changes

Some (announced) breaking changes have been introduced:
- `list_models`, `list_datasets` and `list_spaces` return an iterable instead of a list (lazy-loading of paginated results)
- The parameter `cardData` in `list_datasets` has been removed in favor of the parameter `full`.

Both changes had a deprecation cycle for a few releases now.

* Remove deprecated code + adapt tests by Wauplin in 1450

Bugfixes and small improvements

Token permission

New parameters in `login()` :
- `new_session` : skip login if new_session=False and user is already logged in
- `write_permission` : write permission is required (login fails otherwise)

Also added a new `HfApi().get_token_permission()` method that returns `"read"` or "`write"` (or `None` if not logged in).

* Add new_session, write_permission args by aliabid94 in 1476

List files with details

New parameter to get more details when listing files: `list_repo_files(..., expand=True)`.
API call is slower but `lastCommit` and `security` fields are returned as well.

* Add expand parameter to list_repo_files by Wauplin in 1451

Docs fixes

* Resolve broken link to 'filesystem' by tomaarsen in 1461
* Fix broken link in docs to hf_file_system guide by albertvillanova in 1469
* Remove hffs from docs by albertvillanova in 1468

Misc

* Fix consistency check when downloading a file by Wauplin in 1449
* Fix discussion URL on datasets and spaces by Wauplin in 1465
* FIX user agent not passed in snapshot_download by Wauplin in 1478
* Avoid `ImportError` when importing `WebhooksServer` and Gradio is not installed by mariosasko in 1482
* add utf8 encoding when opening files for windows by abidlabs in 1484
* Fix incorrect syntax in `_deprecation.py` warning message for `_deprecate_list_output()` by x11kjm in 1485
* Update _hf_folder.py by SimonKitSangChu in 1487
* fix pause_and_restart test by Wauplin (direct commit on main)
* Support image-to-image task in InferenceApi by Wauplin in 1489

0.14.1

Fixed an issue [reported in `diffusers`](https://github.com/huggingface/diffusers/issues/3213) impacting users downloading files from outside of the Hub. Expected download size now takes into account potential compression in the HTTP requests.

* Fix consistency check when downloading a file by Wauplin in https://github.com/huggingface/huggingface_hub/pull/1449


**Full Changelog**: https://github.com/huggingface/huggingface_hub/compare/v0.14.0...v0.14.1

0.14.0

HfFileSystem: interact with the Hub through the Filesystem API

We introduce [HfFileSystem](https://huggingface.co/docs/huggingface_hub/main/en/package_reference/hf_file_system#huggingface_hub.HfFileSystem), a pythonic filesystem interface compatible with [`fsspec`](https://filesystem-spec.readthedocs.io/en/latest/). Built on top of `HfApi`, it offers typical filesystem operations like `cp`, `mv`, `ls`, `du`, `glob`, `get_file` and `put_file`.

py
>>> from huggingface_hub import HfFileSystem
>>> fs = HfFileSystem()

List all files in a directory
>>> fs.ls("datasets/myself/my-dataset/data", detail=False)
['datasets/myself/my-dataset/data/train.csv', 'datasets/myself/my-dataset/data/test.csv']

>>> train_data = fs.read_text("datasets/myself/my-dataset/data/train.csv")


Its biggest advantage is to provide ready-to-use integrations with popular libraries like Pandas, DuckDB and Zarr.

py
import pandas as pd

Read a remote CSV file into a dataframe
df = pd.read_csv("hf://datasets/my-username/my-dataset-repo/train.csv")

Write a dataframe to a remote CSV file
df.to_csv("hf://datasets/my-username/my-dataset-repo/test.csv")


For a more detailed overview, please have a look to [this guide](https://huggingface.co/docs/huggingface_hub/main/en/guides/hf_file_system).


* Transfer the `hffs` code to `hfh` by mariosasko in 1420
* Hffs misc improvements by mariosasko in 1433

Webhook Server

`WebhooksServer` allows to implement, debug and deploy webhook endpoints on the Hub without any overhead. Creating a new endpoint is as easy as decorating a Python function.

python
app.py
from huggingface_hub import webhook_endpoint, WebhookPayload

webhook_endpoint
async def trigger_training(payload: WebhookPayload) -> None:
if payload.repo.type == "dataset" and payload.event.action == "update":
Trigger a training job if a dataset is updated
...


For more details, check out this [twitter thread](https://twitter.com/Wauplin/status/1646893678500392960) or the [documentation guide](https://huggingface.co/docs/huggingface_hub/main/en/guides/webhooks_server).

Note that this feature is experimental which means the API/behavior might change without prior notice. A warning is displayed to the user when using it. As it is experimental, we would love to get feedback!

* [Feat] Webhook server by Wauplin in 1410

Some upload QOL improvements

Faster upload with `hf_transfer`

Integration with a Rust-based library to upload large files in chunks and concurrently. Expect x3 speed-up if your bandwidth allows it!

* feat: add `hf_transfer` upload by McPatate in 1395

Upload in multiple commits

Uploading large folders at once might be annoying if any error happens while committing (e.g. a connection error occurs). It is now possible to upload a folder in multiple (smaller) commits. If a commit fails, you can re-run the script and resume the upload. Commits are pushed to a dedicated PR. Once completed, the PR is merged to the `main` branch resulting in a single commit in your git history.

py
upload_folder(
folder_path="local/checkpoints",
repo_id="username/my-dataset",
repo_type="dataset",
multi_commits=True, resumable multi-upload
multi_commits_verbose=True,
)


Note that this feature is also experimental, meaning its behavior might be updated in the future.

* New endpoint: `create_commits_on_pr` by Wauplin in 1375

Upload validation

Some more pre-validation done before committing files to the Hub. The `.git` folder is ignored in `upload_folder` (if any) + fail early in case of invalid paths.

* Fix `path_in_repo` validation when committing files by Wauplin in 1382
* Raise issue if trying to upload `.git/` folder + ignore `.git/` folder in `upload_folder` by Wauplin in 1408

Keep-alive connections between requests

Internal update to reuse the same HTTP session across `huggingface_hub`. The goal is to keep the connection open when doing multiple calls to the Hub which ultimately saves a lot of time. For instance, updating metadata in a README became 40% faster while listing all models from the Hub is 60% faster. This has no impact for atomic calls (e.g. 1 standalone GET call).

* Keep-alive connection between requests by Wauplin in 1394
* Accept backend_factory to configure Sessions by Wauplin in 1442

Custom sleep time for Spaces

It is now possible to programmatically set a custom sleep time on your upgraded Space. After X seconds of inactivity, your Space will go to sleep to save you some $$$.

py
from huggingface_hub import set_space_sleep_time

Put your Space to sleep after 1h of inactivity
set_space_sleep_time(repo_id=repo_id, sleep_time=3600)


* [Feat] Add `sleep_time` for Spaces by Wauplin in 1438

Breaking change

- `fsspec` has been added as a main dependency. It's a lightweight Python library required for `HfFileSystem`.

No other breaking change expected in this release.

Bugfixes & small improvements

File-related

A lot of effort has been invested in making `huggingface_hub`'s cache system more robust especially when working with symlinks on Windows. Hope everything's fixed by now.

* Fix relative symlinks in cache by Wauplin in 1390
* Hotfix - use relative symlinks whenever possible by Wauplin in 1399
* [hot-fix] Malicious repo can overwrite any file on disk by Wauplin in 1429
* Fix symlinks on different volumes on Windows by Wauplin in 1437
* [FIX] bug "Invalid cross-device link" error when using snapshot_download to local_dir with no symlink by thaiminhpv in 1439
* Raise after download if file size is not consistent by Wauplin in 1403

ETag-related

After a server-side configuration issue, we made `huggingface_hub` more robust when getting Hub's Etags to be more future-proof.

* Update file_download.py by Wauplin in 1406
* 🧹 Use `HUGGINGFACE_HEADER_X_LINKED_ETAG` const by julien-c in 1405
* Normalize both possible variants of the Etag to remove potentially invalid path elements by dwforbes in 1428

Documentation-related

* Docs about how to hide progress bars by Wauplin in 1416
* [docs] Update docstring for repo_id in push_to_hub by tomaarsen in 1436

Misc
* Prepare for 0.14 by Wauplin in 1381
* Add force_download to snapshot_download by Wauplin in 1391
* Model card template: Move model usage instructions out of Bias section by NimaBoscarino in 1400
* typo by Wauplin (direct commit on main)
* Log as warning when waiting for ongoing commands by Wauplin in 1415
* Fix: notebook_login() does not update UI on Databricks by fwetdb in 1414
* Passing the headers to hf_transfer download. by Narsil in 1444

Internal stuff
* Fix CI by Wauplin in 1392
* PR should not fail if codecov is bad by Wauplin (direct commit on main)
* remove cov check in PR by Wauplin (direct commit on main)
* Fix restart space test by Wauplin (direct commit on main)
* fix move repo test by Wauplin (direct commit on main)

Page 8 of 16

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.