🔆 Release highlights
One `Dataset` to rule them all
The main difference between Argilla 1.x and Argilla 2.x is that we've converted the previous dataset types tailored for specific NLP tasks into a single highly-configurable `Dataset` class.
With the new `Dataset` you can combine multiple fields and question types, so you can adapt the UI for your specific project. This offers you more flexibility, while making Argilla easier to learn and maintain.
> [!IMPORTANT]
> If you want to continue using your legacy datasets in Argilla 2.x, you will need to convert them into v2 `Dataset`'s as explained in this [migration guide](https://argilla-io.github.io/argilla/latest/how_to_guides/migrate_from_legacy_datasets/). This includes: `DatasetForTextClassification`, `DatasetForTokenClassification`, and `DatasetForText2Text`.
>
> `FeedbackDataset`'s do not need to be converted as they are already compatible with the Argilla v2 format.
New SDK & documentation
We've redesigned our SDK with the idea to adapt it to the new single `Dataset` and `Record` classes and, most importantly, improve the user and developer experience.
The main goal of the new design is to make the SDK easier to use and learn, making it simpler and faster to configure your dataset and get it up and running.
Here's an example of what creating a `Dataset` looks like:
python
import argilla as rg
from datasets import load_dataset
log to the Argilla client
client = rg.Argilla(
api_url="<api_url>",
api_key="<api_key>"
headers={"Authorization": f"Bearer {HF_TOKEN}"}
)
configure dataset settings
settings = rg.Settings(
guidelines="Classify the reviews as positive or negative.",
fields=[
rg.TextField(
name="review",
title="Text from the review",
use_markdown=False,
),
],
questions=[
rg.LabelQuestion(
name="my_label",
title="In which category does this article fit?",
labels=["positive", "negative"],
)
],
)
create the dataset in your Argilla instance
dataset = rg.Dataset(
name=f"my_first_dataset",
settings=settings,
client=client,
)
dataset.create()
get some data from the hugging face hub and load the records
data = load_dataset("imdb", split="train[:100]").to_list()
dataset.records.log(records=data, mapping={"text": "review"})
To learn more about this SDK and how it works, check out our revamped documentation: https://argilla-io.github.io/argilla/latest
We made this new documentation site from scratch, applying [the Diátaxis framework](https://diataxis.fr/) and UX principles with the hope to make this version cleaner and the information easier to find.
New UI layout
We have also redesigned part of our UI for Argilla 2.0:
- We've redistributed the information in the Home page.
- Datasets don't have Tasks, but Questions.
- A clearer way to see your team's progress over each dataset.
- Annotation guidelines and your progress are now accessible at all times within the dataset page.
- Dataset pages also have a new flexible layout, so you can change the size of different panels and expand or collapse the guidelines and progress.
- `SpanQuestion`'s are now supported in the bulk view.
https://github.com/user-attachments/assets/2d959c8a-b4ac-446b-8326-bd66daa28816
Automatic task distribution