Argilla

Latest version: v2.5.0

Safety actively analyzes 688293 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 22

2.2.1

What's Changed

This is a patch release with certain fixes to the SDK:

- Fixed `from_hub` errors when columns names contain uppercase letters. ([5523](https://github.com/argilla-io/argilla/pull/5523))
- Fixed `from_hub` errors when class feature values contains unlabelled values. ([5523](https://github.com/argilla-io/argilla/pull/5523))
- Fixed `from_hub` errors when loading cached datasets. ([5523](https://github.com/argilla-io/argilla/pull/5523))

**Full Changelog**: https://github.com/argilla-io/argilla/compare/v2.2.0...v2.2.1

2.2.0

🌟 Release highlights

> [!IMPORTANT]
> Argilla server `2.2.0` adds support for **background jobs**. These background jobs allow us to run jobs that might take a long time at request time. For this reason we now rely on [Redis](https://redis.io) and [Python RQ](https://python-rq.org) workers.
>
> So to upgrade your Argilla instance to version `2.2.0` you need to have an available Redis server. See the [Redis get-started documentation](https://redis.io/docs/latest/get-started/) for more information or the [Argilla server configuration documentation](https://docs.argilla.io/latest/reference/argilla-server/configuration/).
>
> If you have deployed Argilla server using the docker-compose.yaml, you should download the [docker-compose.yaml](https://github.com/argilla-io/argilla/blob/main/examples/deployments/docker/docker-compose.yaml) file again to bring the latest changes to set Redis and Argilla workers
>
> Workers are needed to process Argilla's background jobs. You can run Argilla workers with the following command:
> sh
> python -m argilla_server worker
>

ChatField: working with text conversations in Argilla

https://github.com/user-attachments/assets/563dd57e-6f99-4b04-9bfa-c930b2a1625c

You can now work with text conversations natively in Argilla using the new `ChatField`. It is especially designed to make it easier to build datasets for conversational Large Language Models (LLMs), displaying conversational data in the form of a chat.

Here's how you can create a dataset with a `ChatField`:
python
import argilla as rg

client = rg.Argilla(api_url="<api_url>", api_key="<api_key>")

settings = rg.Settings(
fields=[rg.ChatField(name="chat")],
questions=[...]
)

dataset = rg.Dataset(
name="chat_dataset",
settings=settings,
workspace="my_workspace",
client=client
)

dataset.create()

record = rg.Record(
fields={
"chat": [
{"role": "user", "content": "Hello World, how are you?"},
{"role": "assistant", "content": "I'm doing great, thank you!"}
]
}
)

dataset.records.log([record])

Read more about how to use this new field type [here](https://docs.argilla.io/latest/how_to_guides/dataset/#fields) and [here](https://docs.argilla.io/dev/how_to_guides/record/#add-records).

Adjust task distribution settings
You can now modify task distribution settings at any time, and Argilla will automatically recalculate the completed and pending records. When you update this setting, records will be removed from or added to the pending queues of your team accordingly.

You can make this change in the dataset settings page or using the SDK:
python
import argilla as rg

client = rg.Argilla(api_url="<api_url>", api_key="<api_key>")

dataset = client.datasets("my_dataset")
dataset.settings.distribution.min_submitted = 2
dataset.update()
`
Track team progress from the SDK
The Argilla SDK now provides a way to retrieve data on annotation progress. This feature allows you to monitor the number of completed and pending records in a dataset and also the number of responses made by each user:
python
import argilla as rg

client = rg.Argilla(api_url="<api_url>", api_key="<api_key>")

dataset = client.datasets("my_dataset")

progress = dataset.progress(with_users_distribution=True)
`
The expected output looks like this:
json
{
"total": 100,
"completed": 50,
"pending": 50,
"users": {
"user1": {
"completed": { "submitted": 10, "draft": 5, "discarded": 5},
"pending": { "submitted": 5, "draft": 10, "discarded": 10},
},
"user2": {
"completed": { "submitted": 20, "draft": 10, "discarded": 5},
"pending": { "submitted": 2, "draft": 25, "discarded": 0},
},
...
}
`
Read more about this feature [here](https://docs.argilla.io/latest/how_to_guides/distribution/#track-your-teams-progress).

Automatic settings inference
When you import a dataset using the from_hub method, Argilla will automatically infer the settings, such as the fields and questions, based on the dataset Features. This will save you time and effort when working with datasets from the Hub.

python
import argilla as rg

client = rg.Argilla(api_url="<api_url>", api_key="<api_key>")

dataset = rg.Dataset.from_hub("yahma/alpaca-cleaned")
`

Task templates
We've added pre-built templates for common dataset types, including text classification, ranking, and rating tasks. These templates provide a starting point for your dataset creation, with pre-configured settings. You can use these templates to get started quickly, without having to configure everything from scratch.
python
import argilla as rg

client = rg.Argilla(api_url="<api_url>", api_key="<api_key>")

settings = rg.Settings.for_classification(labels=["positive", "negative"])

dataset = rg.Dataset(
name="my_dataset",
settings=settings,
client=client,
workspace="my_workspace",
)

dataset.create()
`
Read more about templates [here](https://docs.argilla.io/latest/reference/argilla/settings/settings/#creating-settings-using-built-in-templates).

**Full Changelog**: https://github.com/argilla-io/argilla/compare/v2.1.0...v2.2.0

2.1.0

🌟 Release highlights

Image Field
![Screenshot showing Argilla's new Image Field and Dark Mode](https://github.com/user-attachments/assets/b55c029b-0902-4f1a-ab99-fd68765975cb)
Argilla now supports multimodal datasets with the introduction of a native `ImageField`. This new type of field allows you to work seamlessly with image data, making it easier to annotate and curate datasets that combine text and images.

Here's an example of a dataset with an image field:
python

import argilla as rg

client = rg.Argilla(...)

settings = rg.Settings(
fields = [
rg.ImageField(name="image"),
rg.TextField(name="caption")
],
questions = [
rg.LabelQuestion(
name="good_or_bad",
title="Is the caption good or bad",
labels=["good", "bad"]
),
rg.TextQuestion(name="comments")
]
)

dataset = rg.Dataset(name="image_captions", settings=settings)
dataset.create()

record = rg.Record(
fields= {
"image": "https://docs.argilla.io/dev/assets/logo.svg",
"caption": "This is the Argilla logo"
}
)
dataset.records.log([record])


[Read more](https://docs.argilla.io/latest/how_to_guides/dataset/#fields)

Dark Mode
Argilla seems too bright for you? You can now try our new Dark Mode: a theme designed to reduce eye strain and give a new modern look to the app. You can enable Dark Mode under "My Settings".

Spanish Translation

<img width="1510" alt="Captura de pantalla 2024-09-05 a las 17 28 29" src="https://github.com/user-attachments/assets/0f82e3ce-3654-4e99-9055-db9173619f2f">

We're committed to making Argilla accessible to a broader audience. With the addition of Spanish translation, we're taking another step towards breaking language barriers and enabling more teams to collaborate on data curation projects.
There's nothing you need to do to enable it: Argilla will automatically switch to Spanish when your browser's main language is set to Spanish. ¡Disfrutadla!


Import any dataset from the Hugging Face Hub
The `from_hub` method just got a major boost! You can now input your own settings, allowing you to use this method with almost any dataset from the Hugging Face Hub, not just Argilla datasets.

Here's how easy it is to import a dataset from the Hub:
python
import argilla as rg

client = rg.Argilla(...)

settings = rg.Settings(
fields=[
rg.TextField(name="input"),
],
questions=[
rg.TextQuestion(name="output"),
],
)

dataset = rg.Dataset.from_hub(
repo_id="yahma/alpaca-cleaned",
settings=settings,
)


[Read more](https://docs.argilla.io/latest/reference/argilla/datasets/datasets/?h=from_hub#src.argilla.datasets._export._hub.HubImportExportMixin.from_hub)

Other Notable Fixes and Improvements

* Adaptable text areas for `TextQuestion`'s, providing a better user experience in the UI.
* Enhanced messaging for empty queues, keeping you informed when no records are available in the UI.

**Full Changelog**: https://github.com/argilla-io/argilla/compare/v2.0.1...v2.1.0

2.0.1

What's Changed

🧹 Patch release of bug fixes and minor documentation and messaging improvements. Enjoy your summer while we change the world in `v2.1.0`.

Fixed

- Fixed error when creating optional fields. ([5362](https://github.com/argilla-io/argilla/pull/5362))
- Fixed error creating integer and float metadata with `visible_for_annotators`. ([5364](https://github.com/argilla-io/argilla/pull/5364))
- Fixed error when logging records with `suggestions` or `responses` for non-existent questions. ([5396](https://github.com/argilla-io/argilla/pull/5396) by maxserras)
- Fixed error from conflicts in testing suite when running tests in parallel. ([5349](https://github.com/argilla-io/argilla/commit/1119b164d0623170d44561c6b75d439d2dc96bd0))
- Fixed error in response model when creating a response with a `None` value. ([5343](https://github.com/argilla-io/argilla/commit/9e3705061a2dd88a7852288d9f6fd1aaeaa9b062))

Changed

- Changed `from_hub` method to raise an error when a dataset with the same name exists. ([5258](https://github.com/argilla-io/argilla/pull/5358))
- Changed `log` method when ingesting records with no known keys to raise a descriptive error. ([5356](https://github.com/argilla-io/argilla/pull/5356))
- Changed `code snippets` to add new datasets ([5395](https://github.com/argilla-io/argilla/pull/5395))

Added

- Added Google Analytics to the documentation site. ([5366](https://github.com/argilla-io/argilla/pull/5366))
- Added frontend skeletons to progress metrics to optimise load time and improve user experience. ([5391](https://github.com/argilla-io/argilla/pull/5391))
- Added documentation in methods in API references for the Python SDK. ([5400](https://github.com/argilla-io/argilla/commit/a6fc0117bc4923aec0be80df27eb79ddf3f007c7))


**Full Changelog**: https://github.com/argilla-io/argilla/compare/v2.0.0...v2.0.1

2.0

- An owner or an admin can set the minimum number of submitted responses expected for each record.
- When a record reaches that threshold, its status changes to `complete` and it's automatically removed from the pending queue of all team members.
- A dataset is 100% complete when all records have the status `complete`.

By default, the minimum submitted answers is 1, but you can create a dataset with a different value:
python
settings = rg.Settings(
guidelines="These are some guidelines.",
fields=[
rg.TextField(
name="text",
),
],
questions=[
rg.LabelQuestion(
name="label",
labels=["label_1", "label_2", "label_3"]
),
],
distribution=rg.TaskDistribution(min_submitted=3)
)


You can also change the value of an existing dataset as long as it has no responses. You can do this from the `General` tab inside the Dataset Settings page in the UI or from the SDK:
python
import argilla as rg

client = rg.Argilla(...)

dataset = client.datasets("my_dataset")

dataset.settings.distribution.min_submitted = 4

dataset.update()


To learn more, check our guide on how to [distribute the annotation task](https://argilla-io.github.io/argilla/latest/how_to_guides/distribution/).

Easily deploy in Hugging face Spaces
We've streamlined the deployment of an Argilla Space in the Hugging Face Hub. Now, there's no need to manage users and passwords. Follow these simple steps to create your Argilla Space:
- Select the Argilla template.
- Choose your hardware and persistent storage options (if you prefer others than the recommended ones).
- If you are creating a space inside an organization, enter your Hugging Face Hub username under `username` to get the `owner` role.
- Leave `password` empty if you'd like to use Hugging Face OAuth to sign in to Argilla.
- Select if the space will be public or private.
- `Create Space` ! 🎉
Now you and your team mates can simply sign in to Argilla using Hugging Face OAuth!
Learn more about [deploying Argilla in Hugging Face Spaces](https://argilla-io.github.io/argilla/latest/getting_started/quickstart).


https://github.com/user-attachments/assets/a57a8712-ef4e-45f3-8c38-7bbc47adf02b


New Contributors

* bikash119 made their first contribution in https://github.com/argilla-io/argilla/pull/5294

**Full Changelog**: https://github.com/argilla-io/argilla/compare/v1.29.1...v2.0.0

2.0.0

🔆 Release highlights
One `Dataset` to rule them all
The main difference between Argilla 1.x and Argilla 2.x is that we've converted the previous dataset types tailored for specific NLP tasks into a single highly-configurable `Dataset` class.

With the new `Dataset` you can combine multiple fields and question types, so you can adapt the UI for your specific project. This offers you more flexibility, while making Argilla easier to learn and maintain.

> [!IMPORTANT]
> If you want to continue using your legacy datasets in Argilla 2.x, you will need to convert them into v2 `Dataset`'s as explained in this [migration guide](https://argilla-io.github.io/argilla/latest/how_to_guides/migrate_from_legacy_datasets/). This includes: `DatasetForTextClassification`, `DatasetForTokenClassification`, and `DatasetForText2Text`.
>
> `FeedbackDataset`'s do not need to be converted as they are already compatible with the Argilla v2 format.

New SDK & documentation
We've redesigned our SDK with the idea to adapt it to the new single `Dataset` and `Record` classes and, most importantly, improve the user and developer experience.

The main goal of the new design is to make the SDK easier to use and learn, making it simpler and faster to configure your dataset and get it up and running.

Here's an example of what creating a `Dataset` looks like:
python
import argilla as rg
from datasets import load_dataset

log to the Argilla client
client = rg.Argilla(
api_url="<api_url>",
api_key="<api_key>"
headers={"Authorization": f"Bearer {HF_TOKEN}"}
)

configure dataset settings
settings = rg.Settings(
guidelines="Classify the reviews as positive or negative.",
fields=[
rg.TextField(
name="review",
title="Text from the review",
use_markdown=False,
),
],
questions=[
rg.LabelQuestion(
name="my_label",
title="In which category does this article fit?",
labels=["positive", "negative"],
)
],
)

create the dataset in your Argilla instance
dataset = rg.Dataset(
name=f"my_first_dataset",
settings=settings,
client=client,
)
dataset.create()

get some data from the hugging face hub and load the records
data = load_dataset("imdb", split="train[:100]").to_list()
dataset.records.log(records=data, mapping={"text": "review"})


To learn more about this SDK and how it works, check out our revamped documentation: https://argilla-io.github.io/argilla/latest

We made this new documentation site from scratch, applying [the Diátaxis framework](https://diataxis.fr/) and UX principles with the hope to make this version cleaner and the information easier to find.

New UI layout
We have also redesigned part of our UI for Argilla 2.0:
- We've redistributed the information in the Home page.
- Datasets don't have Tasks, but Questions.
- A clearer way to see your team's progress over each dataset.
- Annotation guidelines and your progress are now accessible at all times within the dataset page.
- Dataset pages also have a new flexible layout, so you can change the size of different panels and expand or collapse the guidelines and progress.
- `SpanQuestion`'s are now supported in the bulk view.

https://github.com/user-attachments/assets/2d959c8a-b4ac-446b-8326-bd66daa28816

Automatic task distribution

Page 2 of 22

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.