☀️ Highlights
This release comes with a lot of new goodies and quality improvements. We added model card support for the `ArgillaTrainer`, worked on the `FeedbackDataset` task templates and added timestamps to responses. We also fixed a lot of bugs and improved the overall quality of the codebase. Enjoy!
🚨 Breaking change in updating existing Hugging Face Spaces deployments
The quickstart image startup script was changed from `from /start_quickstart.sh` to `/home/argilla/start_quickstart.sh`, which might cause existing Hugging Face Spaces deployments to malfunction. A fix was added for the Argilla template space via [this PR](https://huggingface.co/spaces/argilla/argilla-template-space/discussions/19/files). Alternatively, you can just [create a new deployment](https://huggingface.co/new-space?template=argilla%2Fargilla-template-space).
⚠️ Breaking change using SQLite as backend in a docker deployment
From version 1.17.0 a new `argilla` os user is configured for the provided docker images. If you are using the docker deployment and you want to upload to this version, you should do some actions once update your container and before working with Argilla. Execute the following command:
bash
docker exec --user root <argilla_server_container_id> /bin/bash -c 'chown -R argilla:argilla "$ARGILLA_HOME_PATH"'
This will change the permissions on the argilla home path, which allows it to work with new containers.
Note: You can find the docker container id by running:
bash
docker ps | grep -i argilla-server
bash
713973693fb7 argilla/argilla-server:v1.17.0 "/bin/bash start_arg…" 11 hours ago Up 7 minutes 0.0.0.0:6900->6900/tcp docker-argilla-1
💾 `ArgillaTrainer` Model Card Generation
The `ArgillaTrainer` now supports automatic model card generation. This means that you can now generate a model card with all the required info for Hugging Face and directly share these models to the hub, as you would expect within the Hugging Face ecosystem. See [the docs](https://docs.argilla.io/en/v1.17.0/practical_guides/fine_tune.html#model-card-generation) for more info.
python
model_card_kwargs = {
"language": ["en", "es"],
"license": "Apache-2.0",
"model_id": "all-MiniLM-L6-v2",
"dataset_name": "argilla/emotion",
"tags": ["nlp", "few-shot-learning", "argilla", "setfit"],
"model_summary": "Small summary of what the model does",
"model_description": "An extended explanation of the model",
"model_type": "A 1.3B parameter embedding model fine-tuned on an awesome dataset",
"finetuned_from": "all-MiniLM-L6-v2",
"repo": "https://github.com/..."
"developers": "",
"shared_by": "",
}
trainer = ArgillaTrainer(
dataset=dataset,
task=task,
framework="setfit",
framework_kwargs={"model_card_kwargs": model_card_kwargs}
)
trainer.train(output_dir="my_model")
or get the card as `str` by calling the `generate_model_card` method
argilla_model_card = trainer.generate_model_card("my_model")
🦮 `FeedbackDataset` Task Templates
The Argilla `FeedbackDataset` now supports a number of task templates that can be used to quickly create a dataset for specific tasks out of the box. This should help starting users get right into the action without having to worry about the dataset structure. We support basic tasks like Text Classification but also allow you to setup complex RAG-pipelines. See [the docs](https://docs.argilla.io/en/v1.17.0/practical_guides/create_dataset.html#task-templates) for more info.
python
import argilla as rg
ds = rg.FeedbackDataset.for_text_classification(
labels=["positive", "negative"],
multi_label=False,
use_markdown=True,
guidelines=None,
)
ds
FeedbackDataset(
fields=[TextField(name="text", use_markdown=True)],
questions=[LabelQuestion(name="label", labels=["positive", "negative"])]
guidelines="<Guidelines for the task>",
)
⏱️ `inserted_at` and `updated_at` are added to responses
What are responses without timestamps? The `RemoteResponseSchema` now supports `inserted_at` and `updated_at` fields. This should help you to keep track of the time when a response was created and updated. Perfectly, for keeping track of annotator performance within your company.
[1.17.0](https://github.com/argilla-io/argilla/compare/v1.16.0...v1.17.0)
Added
- Added fields `inserted_at` and `updated_at` in `RemoteResponseSchema` ([3822](https://github.com/argilla-io/argilla/pull/3822)).
- Added automatic model card generation through `ArgillaTrainer.save` ([3857](https://github.com/argilla-io/argilla/pull/3857)).
- Added task templates to the `FeedbackDataset` ([3973](https://github.com/argilla-io/argilla/pull/3973)).
Changed
- Updated `Dockerfile` to use multi stage build ([3221](https://github.com/argilla-io/argilla/pull/3221) and [#3793](https://github.com/argilla-io/argilla/pull/3793)).
- Updated active learning for text classification notebooks to use the most recent small-text version ([3831](https://github.com/argilla-io/argilla/pull/3831)).
- Changed argilla dataset name in the active learning for text classification notebooks to be consistent with the default names in the huggingface spaces ([3831](https://github.com/argilla-io/argilla/pull/3831)).
- FeedbackDataset API methods have been aligned to be accessible through the several implementations ([3937](https://github.com/argilla-io/argilla/pull/3937)).
- The `unify_responses` support for remote datasets ([3937](https://github.com/argilla-io/argilla/pull/3937)).
Fixed
- Fix field not shown in the order defined in the dataset settings. Closes [3959](https://github.com/argilla-io/argilla/issues/3959) ([#3984](https://github.com/argilla-io/argilla/pull/3984))
- Updated active learning for text classification notebooks to pass ids of type int to `TextClassificationRecord` ([3831](https://github.com/argilla-io/argilla/pull/3831)).
- Fixed record fields validation that was preventing from logging records with optional fields (i.e. `required=True`) when the field value was `None` ([3846](https://github.com/argilla-io/argilla/pull/3846)).
- Always set `pretrained_model_name_or_path` attribute as string in `ArgillaTrainer` ([3914](https://github.com/argilla-io/argilla/pull/3914)).
- The `inserted_at` and `updated_at` attributes are create using the `utcnow` factory to avoid unexpected race conditions on timestamp creation ([3945](https://github.com/argilla-io/argilla/pull/3945))
- Fixed `configure_dataset_settings` when providing the workspace via the arg `workspace` ([3887](https://github.com/argilla-io/argilla/pull/3887)).
- Fixed saving of models trained with `ArgillaTrainer` with a `peft_config` parameter ([3795](https://github.com/argilla-io/argilla/pull/3795)).
- Fixed backwards compatibility on `from_huggingface` when loading a `FeedbackDataset` from the Hugging Face Hub that was previously dumped using another version of Argilla, starting at 1.8.0, when it was first introduced ([3829](https://github.com/argilla-io/argilla/pull/3829)).
- Fixed `TrainingTaskForQuestionAnswering.__repr__` ([3969](https://github.com/argilla-io/argilla/pull/3969))
- Fixed potential dictionary key-errors in `TrainingTask.prepare_for_training_with_*`-methods ([3969](https://github.com/argilla-io/argilla/pull/3969))
Deprecated
- Function `rg.configure_dataset` is deprecated in favour of `rg.configure_dataset_settings`. The former will be removed in version 1.19.0
New Contributors
* osintalex made their first contribution in https://github.com/argilla-io/argilla/pull/3221
* kursathalat made their first contribution in https://github.com/argilla-io/argilla/pull/3756
* splevine made their first contribution in https://github.com/argilla-io/argilla/pull/3832
**Full Changelog**: https://github.com/argilla-io/argilla/compare/v1.16.0...v1.17.0