🌟 Release highlights
> [!IMPORTANT]
> Argilla server `2.2.0` adds support for **background jobs**. These background jobs allow us to run jobs that might take a long time at request time. For this reason we now rely on [Redis](https://redis.io) and [Python RQ](https://python-rq.org) workers.
>
> So to upgrade your Argilla instance to version `2.2.0` you need to have an available Redis server. See the [Redis get-started documentation](https://redis.io/docs/latest/get-started/) for more information or the [Argilla server configuration documentation](https://docs.argilla.io/latest/reference/argilla-server/configuration/).
>
> If you have deployed Argilla server using the docker-compose.yaml, you should download the [docker-compose.yaml](https://github.com/argilla-io/argilla/blob/main/examples/deployments/docker/docker-compose.yaml) file again to bring the latest changes to set Redis and Argilla workers
>
> Workers are needed to process Argilla's background jobs. You can run Argilla workers with the following command:
> sh
> python -m argilla_server worker
>
ChatField: working with text conversations in Argilla
https://github.com/user-attachments/assets/563dd57e-6f99-4b04-9bfa-c930b2a1625c
You can now work with text conversations natively in Argilla using the new `ChatField`. It is especially designed to make it easier to build datasets for conversational Large Language Models (LLMs), displaying conversational data in the form of a chat.
Here's how you can create a dataset with a `ChatField`:
python
import argilla as rg
client = rg.Argilla(api_url="<api_url>", api_key="<api_key>")
settings = rg.Settings(
fields=[rg.ChatField(name="chat")],
questions=[...]
)
dataset = rg.Dataset(
name="chat_dataset",
settings=settings,
workspace="my_workspace",
client=client
)
dataset.create()
record = rg.Record(
fields={
"chat": [
{"role": "user", "content": "Hello World, how are you?"},
{"role": "assistant", "content": "I'm doing great, thank you!"}
]
}
)
dataset.records.log([record])
Read more about how to use this new field type [here](https://docs.argilla.io/latest/how_to_guides/dataset/#fields) and [here](https://docs.argilla.io/dev/how_to_guides/record/#add-records).
Adjust task distribution settings
You can now modify task distribution settings at any time, and Argilla will automatically recalculate the completed and pending records. When you update this setting, records will be removed from or added to the pending queues of your team accordingly.
You can make this change in the dataset settings page or using the SDK:
python
import argilla as rg
client = rg.Argilla(api_url="<api_url>", api_key="<api_key>")
dataset = client.datasets("my_dataset")
dataset.settings.distribution.min_submitted = 2
dataset.update()
`
Track team progress from the SDK
The Argilla SDK now provides a way to retrieve data on annotation progress. This feature allows you to monitor the number of completed and pending records in a dataset and also the number of responses made by each user:
python
import argilla as rg
client = rg.Argilla(api_url="<api_url>", api_key="<api_key>")
dataset = client.datasets("my_dataset")
progress = dataset.progress(with_users_distribution=True)
`
The expected output looks like this:
json
{
"total": 100,
"completed": 50,
"pending": 50,
"users": {
"user1": {
"completed": { "submitted": 10, "draft": 5, "discarded": 5},
"pending": { "submitted": 5, "draft": 10, "discarded": 10},
},
"user2": {
"completed": { "submitted": 20, "draft": 10, "discarded": 5},
"pending": { "submitted": 2, "draft": 25, "discarded": 0},
},
...
}
`
Read more about this feature [here](https://docs.argilla.io/latest/how_to_guides/distribution/#track-your-teams-progress).
Automatic settings inference
When you import a dataset using the from_hub method, Argilla will automatically infer the settings, such as the fields and questions, based on the dataset Features. This will save you time and effort when working with datasets from the Hub.
python
import argilla as rg
client = rg.Argilla(api_url="<api_url>", api_key="<api_key>")
dataset = rg.Dataset.from_hub("yahma/alpaca-cleaned")
`
Task templates
We've added pre-built templates for common dataset types, including text classification, ranking, and rating tasks. These templates provide a starting point for your dataset creation, with pre-configured settings. You can use these templates to get started quickly, without having to configure everything from scratch.
python
import argilla as rg
client = rg.Argilla(api_url="<api_url>", api_key="<api_key>")
settings = rg.Settings.for_classification(labels=["positive", "negative"])
dataset = rg.Dataset(
name="my_dataset",
settings=settings,
client=client,
workspace="my_workspace",
)
dataset.create()
`
Read more about templates [here](https://docs.argilla.io/latest/reference/argilla/settings/settings/#creating-settings-using-built-in-templates).
**Full Changelog**: https://github.com/argilla-io/argilla/compare/v2.1.0...v2.2.0