Argilla

Latest version: v2.5.0

Safety actively analyzes 688293 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 22

1.22.0

🔆 Release Highlights

Bulk actions in Feedback Task datasets
Our signature bulk actions are now available for Feedback datasets!

https://user-images.githubusercontent.com/126158523/297772506-97d83a54-ea3f-4700-acd6-ff9e349ade63.mp4


Switch between *Focus* and *Bulk* depending on your needs:

- In the *Focus* view, you can navigate and respond to records individually. This is ideal for closely examining and giving responses to each record.
- The *Bulk* view allows you to see multiple records on the same page. You can select all or some of them and perform actions in bulk, such as applying a label, saving responses, submitting, or discarding. You can use this feature along with filters and similarity search to process a list of records in bulk.

For now, this is only available in the *Pending* queue, but rest assured, bulk actions will be improved and extended to other queues in upcoming releases.

Read more about our *Focus* and *Bulk* views [here](https://docs.argilla.io/en/latest/practical_guides/annotate_dataset.html#focus-vs-bulk-view).

Sorting rating values

We now support sorting records in the Argilla UI based on the values of Rating questions (both suggestions and responses):
![Screenshot of the sorting by Rating question value options](https://user-images.githubusercontent.com/126158523/297764458-5204a09d-7060-4ff7-83f1-93b7acf5d74b.png)


Learn about this and other filters [in our docs](https://docs.argilla.io/en/latest/practical_guides/filter_dataset.html#feedback-dataset).

Out-of-the-box embedding support

It’s now easier than ever to add vector embeddings to your records with the new Sentence Transformers integration.

Just choose a model from the Hugging Face hub and use our `SentenceTransformersExtractor` to add vectors to your dataset:

python
import argilla as rg
from argilla.client.feedback.integrations.sentencetransformers import SentenceTransformersExtractor

Connect to Argilla
rg.init(
api_url="http://localhost:6900",
api_key="owner.apikey",
workspace="my_workspace"
)

Initialize the SentenceTransformersExtractor
ste = SentenceTransformersExtractor(
model = "TaylorAI/bge-micro-v2", Use a model from https://huggingface.co/models?library=sentence-transformers
show_progress = False,
)

Load a dataset from your Argilla instance
ds_remote = rg.FeedbackDataset.from_argilla("my_dataset")

Update the dataset
ste.update_dataset(
dataset=ds_remote,
fields=["context"], Only update the context field
update_records=True, Update the records in the dataset
overwrite=False, Overwrite existing fields
)


Learn more about this functionality in [this tutorial](https://docs.argilla.io/en/latest/tutorials_and_integrations/integrations/add_sentence_transformers_embeddings_as_vectors.html).

[Changelog 1.22.0](https://github.com/argilla-io/argilla/compare/v1.21.0...v1.22.0)

Added

- Added Bulk annotation support. ([4333](https://github.com/argilla-io/argilla/pull/4333))
- Restore filters from feedback dataset settings. ([4461](https://github.com/argilla-io/argilla/pull/4461))
- Warning on feedback dataset settings when leaving page with unsaved changes. ([4461](https://github.com/argilla-io/argilla/pull/4461))
- Added pydantic v2 support using the python SDK. ([4459](https://github.com/argilla-io/argilla/pull/4459))
- Added `vector_settings` to the `__repr__` method of the `FeedbackDataset` and `RemoteFeedbackDataset`. ([4454](https://github.com/argilla-io/argilla/pull/4454))
- Added integration for `sentence-transformers` using `SentenceTransformersExtractor` to configure `vector_settings` in `FeedbackDataset` and `FeedbackRecord`. ([4454](https://github.com/argilla-io/argilla/pull/4454))

Changed

- Module `argilla.cli.server` definitions have been moved to `argilla.server.cli` module. ([4472](https://github.com/argilla-io/argilla/pull/4472))
- [breaking] Changed `vector_settings_by_name` for generic `property_by_name` usage, which will return `None` instead of raising an error. ([4454](https://github.com/argilla-io/argilla/pull/4454))
- The constant definition `ES_INDEX_REGEX_PATTERN` in module `argilla._constants` is now private. ([4472](https://github.com/argilla-io/argilla/pull/4474))
- `nan` values in metadata properties will raise a 422 error when creating/updating records. ([4300](https://github.com/argilla-io/argilla/issues/4300))
- `None` values are now allowed in metadata properties. ([4300](https://github.com/argilla-io/argilla/issues/4300))

Fixed

- Paginating to a new record, automatically scrolls down to selected form area. ([4333](https://github.com/argilla-io/argilla/pull/4333))

Deprecated

- The `missing` response status for filtering records is deprecated and will be removed in the release v1.24.0. Use `pending` instead. ([4433](https://github.com/argilla-io/argilla/pull/4433))

Removed

- The deprecated `python -m argilla database` command has been removed. ([4472](https://github.com/argilla-io/argilla/pull/4472))


New Contributors
* Piyush-Kumar-Ghosh made their first contribution in https://github.com/argilla-io/argilla/pull/4463

**Full Changelog**: https://github.com/argilla-io/argilla/compare/v1.21.0...v1.22.0

1.21.0

🔆 Release highlights

Draft queue

We’ve added a new queue in the Feedback Task UI so that you can save your drafts and have them all together in a separate view. This allows you to save your responses and come back to them before submission.

Note that responses won’t be autosaved now and to save your changes you will need to click on “Save as draft” or use the shortcut `command ⌘` + `S` (macOS), `Ctrl` + `S` (other).

Improved shortcuts

We’ve been working to improve the keyboard shortcuts within the Feedback Task UI to make them more productive and user-friendly.

You can now select labels in Label and Multi-label questions using the numerical keys in your keyboard. To know which number corresponds with each label you can simply show or hide helpers by pressing `command ⌘` (MacOS) or `Ctrl` (other) for 2 seconds. You will then see the numbers next to the corresponding labels.

We’ve also simplified shortcuts for navigation and actions, so that they use as few keys as possible.

Check all available shortcuts [here](https://docs.argilla.io/en/latest/practical_guides/annotate_dataset.html#shortcuts).

New `metrics` module
We've added a new module to analyze the annotations, both in terms of agreement between the annotators and in terms of data and model drift monitoring.

Agreement metrics
Easily measure the inter-annotator agreement to explore the quality of the annotation guidelines and consistency between annotators:

python
import argilla as rg
from argilla.client.feedback.metrics import AgreementMetric
feedback_dataset = rg.FeedbackDataset.from_argilla("...", workspace="...")
metric = AgreementMetric(dataset=feedback_dataset, question_name="question_name")
agreement_metrics = metric.compute("alpha")
>>> agreement_metrics
[AgreementMetricResult(metric_name='alpha', count=1000, result=0.467889)]

Read more [here](https://docs.argilla.io/en/latest/practical_guides/collect_responses.html#agreement-metrics).

Model metrics
You can use `ModelMetric` to model monitor performance for data and model drift:

python
import argilla as rg
from argilla.client.feedback.metrics import ModelMetric
feedback_dataset = rg.FeedbackDataset.from_argilla("...", workspace="...")
metric = ModelMetric(dataset=feedback_dataset, question_name="question_name")
annotator_metrics = metric.compute("accuracy")
>>> annotator_metrics
{'00000000-0000-0000-0000-000000000001': [ModelMetricResult(metric_name='accuracy', count=3, result=0.5)], '00000000-0000-0000-0000-000000000002': [ModelMetricResult(metric_name='accuracy', count=3, result=0.25)], '00000000-0000-0000-0000-000000000003': [ModelMetricResult(metric_name='accuracy', count=3, result=0.5)]}

Read more [here](https://docs.argilla.io/en/latest/practical_guides/collect_responses.html#model-metrics).

List aggregation support for `TermsMetadataProperty`

You can now pass a list of terms within a record’s metadata that will be aggregated and filterable as part of a `TermsMetadataProperty`.

Here is an example:

python
import argilla as rg

dataset = rg.FeedbackDataset(
fields = ...,
questions = ...,
metadata_properties = [rg.TermsMetadataProperty(name="annotators")]
)

record = rg.FeedbackRecord(
fields = ...,
metadata = {"annotators": ["user_1", "user_2"]}
)


Reindex from CLI

Reindex all entities in your Argilla instance (datasets, records, responses, etc.) with a simple CLI command.

bash
argilla server reindex


This is useful when you are working with an existing feedback datasets and you want to update the search engine info.

[Changelog 1.21.0](https://github.com/argilla-io/argilla/compare/v1.20.0...v1.21.0)

Added

- Added new draft queue for annotation view ([4334](https://github.com/argilla-io/argilla/pull/4334))
- Added annotation metrics module for the `FeedbackDataset` (`argilla.client.feedback.metrics`). ([4175](https://github.com/argilla-io/argilla/pull/4175)).
- Added strategy to handle and translate errors from the server for `401` HTTP status code` ([4362](https://github.com/argilla-io/argilla/pull/4362))
- Added integration for `textdescriptives` using `TextDescriptivesExtractor` to configure `metadata_properties` in `FeedbackDataset` and `FeedbackRecord`. ([4400](https://github.com/argilla-io/argilla/pull/4400)). Contributed by m-newhauser
- Added `POST /api/v1/me/responses/bulk` endpoint to create responses in bulk for current user. ([4380](https://github.com/argilla-io/argilla/pull/4380))
- Added list support for term metadata properties. (Closes [4359](https://github.com/argilla-io/argilla/issues/4359))
- Added new CLI task to reindex datasets and records into the search engine. ([4404](https://github.com/argilla-io/argilla/pull/4404))
- Added `httpx_extra_kwargs` argument to `rg.init` and `Argilla` to allow passing extra arguments to `httpx.Client` used by `Argilla`. ([4440](https://github.com/argilla-io/argilla/pull/4441))

Changed

- More productive and simpler shortcuts system ([4215](https://github.com/argilla-io/argilla/pull/4215))
- Move `ArgillaSingleton`, `init` and `active_client` to a new module `singleton`. ([4347](https://github.com/argilla-io/argilla/pull/4347))
- Updated `argilla.load` functions to also work with `FeedbackDataset`s. ([4347](https://github.com/argilla-io/argilla/pull/4347))
- [breaking] Updated `argilla.delete` functions to also work with `FeedbackDataset`s. It now raises an error if the dataset does not exist. ([4347](https://github.com/argilla-io/argilla/pull/4347))
- Updated `argilla.list_datasets` functions to also work with `FeedbackDataset`s. ([4347](https://github.com/argilla-io/argilla/pull/4347))

Fixed

- Fixed error in `TextClassificationSettings.from_dict` method in which the `label_schema` created was a list of `dict` instead of a list of `str`. ([4347](https://github.com/argilla-io/argilla/pull/4347))
- Fixed total records on pagination component ([4424](https://github.com/argilla-io/argilla/pull/4424))

Removed

- Removed `draft` auto save for annotation view ([4334](https://github.com/argilla-io/argilla/pull/4334))

1.20.0

Added

- Added `GET /api/v1/datasets/:dataset_id/records/search/suggestions/options` endpoint to return suggestion available options for searching. ([4260](https://github.com/argilla-io/argilla/pull/4260))
- Added `metadata_properties` to the `__repr__` method of the `FeedbackDataset` and `RemoteFeedbackDataset`.([4192](https://github.com/argilla-io/argilla/pull/4192)).
- Added `get_model_kwargs`, `get_trainer_kwargs`, `get_trainer_model`, `get_trainer_tokenizer` and `get_trainer` -methods to the `ArgillaTrainer` to improve interoperability across frameworks. ([4214](https://github.com/argilla-io/argilla/pull/4214)).
- Added additional formatting checks to the `ArgillaTrainer` to allow for better interoperability of `defaults` and `formatting_func` usage. ([4214](https://github.com/argilla-io/argilla/pull/4214)).
- Added a warning to the `update_config`-method of `ArgillaTrainer` to emphasize if the `kwargs` were updated correctly. ([4214](https://github.com/argilla-io/argilla/pull/4214)).
- Added `argilla.client.feedback.utils` module with `html_utils` (this mainly includes `video/audio/image_to_html` that convert media to dataURL to be able to render them in tha Argilla UI and `create_token_highlights` to highlight tokens in a custom way. Both work on TextQuestion and TextField with use_markdown=True) and `assignments` (this mainly includes `assign_records` to assign records according to a number of annotators and records, an overlap and the shuffle option; and `assign_workspace` to assign and create if needed a workspace according to the record assignment). ([4121](https://github.com/argilla-io/argilla/pull/4121))

Fixed

- Fixed error in `ArgillaTrainer`, with numerical labels, using `RatingQuestion` instead of `RankingQuestion` ([4171](https://github.com/argilla-io/argilla/pull/4171))
- Fixed error in `ArgillaTrainer`, now we can train for `extractive_question_answering` using a validation sample ([4204](https://github.com/argilla-io/argilla/pull/4204))
- Fixed error in `ArgillaTrainer`, when training for `sentence-similarity` it didn't work with a list of values per record ([4211](https://github.com/argilla-io/argilla/pull/4211))
- Fixed error in the unification strategy for `RankingQuestion` ([4295](https://github.com/argilla-io/argilla/pull/4295))
- Fixed `TextClassificationSettings.labels_schema` order was not being preserved. Closes [3828](https://github.com/argilla-io/argilla/issues/3828) ([#4332](https://github.com/argilla-io/argilla/pull/4332))
- Fixed error when requesting non-existing API endpoints. Closes [4073](https://github.com/argilla-io/argilla/issues/4073) ([#4325](https://github.com/argilla-io/argilla/pull/4325))
- Fixed error when passing `draft` responses to create records endpoint. ([4354](https://github.com/argilla-io/argilla/pull/4354))

Changed

- [breaking] Suggestions `agent` field only accepts now some specific characters and a limited length. ([4265](https://github.com/argilla-io/argilla/pull/4265))
- [breaking] Suggestions `score` field only accepts now float values in the range `0` to `1`. ([4266](https://github.com/argilla-io/argilla/pull/4266))
- Updated `POST /api/v1/dataset/:dataset_id/records/search` endpoint to support optional `query` attribute. ([4327](https://github.com/argilla-io/argilla/pull/4327))
- Updated `POST /api/v1/dataset/:dataset_id/records/search` endpoint to support `filter` and `sort` attributes. ([4327](https://github.com/argilla-io/argilla/pull/4327))
- Updated `POST /api/v1/me/datasets/:dataset_id/records/search` endpoint to support optional `query` attribute. ([4270](https://github.com/argilla-io/argilla/pull/4270))
- Updated `POST /api/v1/me/datasets/:dataset_id/records/search` endpoint to support `filter` and `sort` attributes. ([4270](https://github.com/argilla-io/argilla/pull/4270))
- Changed the logging style while pulling and pushing `FeedbackDataset` to Argilla from `tqdm` style to `rich`. ([4267](https://github.com/argilla-io/argilla/pull/4267)). Contributed by zucchini-nlp.
- Updated `push_to_argilla` to print `repr` of the pushed `RemoteFeedbackDataset` after push and changed `show_progress` to True by default. ([4223](https://github.com/argilla-io/argilla/pull/4223))
- Changed `models` and `tokenizer` for the `ArgillaTrainer` to explicitly allow for changing them when needed. ([4214](https://github.com/argilla-io/argilla/pull/4214)).

1.19.0

Added

- Added `POST /api/v1/datasets/:dataset_id/records/search` endpoint to search for records without user context, including responses by all users. ([4143](https://github.com/argilla-io/argilla/pull/4143))
- Added `POST /api/v1/datasets/:dataset_id/vectors-settings` endpoint for creating vector settings for a dataset. ([3776](https://github.com/argilla-io/argilla/pull/3776))
- Added `GET /api/v1/datasets/:dataset_id/vectors-settings` endpoint for listing the vectors settings for a dataset. ([3776](https://github.com/argilla-io/argilla/pull/3776))
- Added `DELETE /api/v1/vectors-settings/:vector_settings_id` endpoint for deleting a vector settings. ([3776](https://github.com/argilla-io/argilla/pull/3776))
- Added `PATCH /api/v1/vectors-settings/:vector_settings_id` endpoint for updating a vector settings. ([4092](https://github.com/argilla-io/argilla/pull/4092))
- Added `GET /api/v1/records/:record_id` endpoint to get a specific record. ([4039](https://github.com/argilla-io/argilla/pull/4039))
- Added support to include vectors for `GET /api/v1/datasets/:dataset_id/records` endpoint response using `include` query param. ([4063](https://github.com/argilla-io/argilla/pull/4063))
- Added support to include vectors for `GET /api/v1/me/datasets/:dataset_id/records` endpoint response using `include` query param. ([4063](https://github.com/argilla-io/argilla/pull/4063))
- Added support to include vectors for `POST /api/v1/me/datasets/:dataset_id/records/search` endpoint response using `include` query param. ([4063](https://github.com/argilla-io/argilla/pull/4063))
- Added `show_progress` argument to `from_huggingface()` method to make the progress bar for parsing records process optional.([4132](https://github.com/argilla-io/argilla/pull/4132)).
- Added a progress bar for parsing records process to `from_huggingface()` method with `trange` in `tqdm`.([4132](https://github.com/argilla-io/argilla/pull/4132)).
- Added to sort by `inserted_at` or `updated_at` for datasets with no metadata. ([4147](https://github.com/argilla-io/argilla/pull/4147))
- Added `max_records` argument to `pull()` method for `RemoteFeedbackDataset`.([4074](https://github.com/argilla-io/argilla/pull/4074))
- Added functionality to push your models to the Hugging Face hub with `ArgillaTrainer.push_to_huggingface` ([3976](https://github.com/argilla-io/argilla/pull/3976)). Contributed by Racso-3141.
- Added `filter_by` argument to `ArgillaTrainer` to filter by `response_status` ([4120](https://github.com/argilla-io/argilla/pull/4120)).
- Added `sort_by` argument to `ArgillaTrainer` to sort by `metadata` ([4120](https://github.com/argilla-io/argilla/pull/4120)).
- Added `max_records` argument to `ArgillaTrainer` to limit record used for training ([4120](https://github.com/argilla-io/argilla/pull/4120)).
- Added `add_vector_settings` method to local and remote `FeedbackDataset`. ([4055](https://github.com/argilla-io/argilla/pull/4055))
- Added `update_vectors_settings` method to local and remote `FeedbackDataset`. ([4122](https://github.com/argilla-io/argilla/pull/4122))
- Added `delete_vectors_settings` method to local and remote `FeedbackDataset`. ([4130](https://github.com/argilla-io/argilla/pull/4130))
- Added `vector_settings_by_name` method to local and remote `FeedbackDataset`. ([4055](https://github.com/argilla-io/argilla/pull/4055))
- Added `find_similar_records` method to local and remote `FeedbackDataset`. ([4023](https://github.com/argilla-io/argilla/pull/4023))
- Added `ARGILLA_SEARCH_ENGINE` environment variable to configure the search engine to use. ([4019](https://github.com/argilla-io/argilla/pull/4019))

Changed

- [breaking] Remove support for Elasticsearch < 8.5 and OpenSearch < 2.4. ([4173](https://github.com/argilla-io/argilla/pull/4173))
- [breaking] Users working with OpenSearch engines must use version >=2.4 and set `ARGILLA_SEARCH_ENGINE=opensearch`. ([4019](https://github.com/argilla-io/argilla/pull/4019) and [#4111](https://github.com/argilla-io/argilla/pull/4111))
- [breaking] Changed `FeedbackDataset.*_by_name()` methods to return `None` when no match is found ([4101](https://github.com/argilla-io/argilla/pull/3976)).
- [breaking] `limit` query parameter for `GET /api/v1/datasets/:dataset_id/records` endpoint is now only accepting values greater or equal than `1` and less or equal than `1000`. ([4143](https://github.com/argilla-io/argilla/pull/4143))
- [breaking] `limit` query parameter for `GET /api/v1/me/datasets/:dataset_id/records` endpoint is now only accepting values greater or equal than `1` and less or equal than `1000`. ([4143](https://github.com/argilla-io/argilla/pull/4143))
- Update `GET /api/v1/datasets/:dataset_id/records` endpoint to fetch record using the search engine. ([4142](https://github.com/argilla-io/argilla/pull/4142))
- Update `GET /api/v1/me/datasets/:dataset_id/records` endpoint to fetch record using the search engine. ([4142](https://github.com/argilla-io/argilla/pull/4142))
- Update `POST /api/v1/datasets/:dataset_id/records` endpoint to allow to create records with `vectors` ([4022](https://github.com/argilla-io/argilla/pull/4022))
- Update `PATCH /api/v1/datasets/:dataset_id` endpoint to allow updating `allow_extra_metadata` attribute. ([4112](https://github.com/argilla-io/argilla/pull/4112))
- Update `PATCH /api/v1/datasets/:dataset_id/records` endpoint to allow to update records with `vectors`. ([4062](https://github.com/argilla-io/argilla/pull/4062))
- Update `PATCH /api/v1/records/:record_id` endpoint to allow to update record with `vectors`. ([4062](https://github.com/argilla-io/argilla/pull/4062))
- Update `POST /api/v1/me/datasets/:dataset_id/records/search` endpoint to allow to search records with vectors. ([4019](https://github.com/argilla-io/argilla/pull/4019))
- Update `BaseElasticAndOpenSearchEngine.index_records` method to also index record vectors. ([4062](https://github.com/argilla-io/argilla/pull/4062))
- Update `FeedbackDataset.__init__` to allow passing a list of vector settings. ([4055](https://github.com/argilla-io/argilla/pull/4055))
- Update `FeedbackDataset.push_to_argilla` to also push vector settings. ([4055](https://github.com/argilla-io/argilla/pull/4055))
- Update `FeedbackDatasetRecord` to support the creation of records with vectors. ([4043](https://github.com/argilla-io/argilla/pull/4043))
- Using cosine similarity to compute similarity between vectors. ([4124](https://github.com/argilla-io/argilla/pull/4124))

Fixed

- Fixed svg images out of screen with too large images ([4047](https://github.com/argilla-io/argilla/pull/4047))
- Fixed creating records with responses from multiple users. Closes [3746](https://github.com/argilla-io/argilla/issues/3746) and [#3808](https://github.com/argilla-io/argilla/issues/3808) ([#4142](https://github.com/argilla-io/argilla/pull/4142))
- Fixed deleting or updating responses as an owner for annotators. (Commit [403a66d](https://github.com/argilla-io/argilla/commit/403a66d16d816fa8a62e3f76314ccc90e0073297))
- Fixed passing user_id when getting records by id. (Commit [98c7927](https://github.com/argilla-io/argilla/commit/98c792757a21da05bac89b7f625e7e5792ad59f9))
- Fixed non-basic tags serialized when pushing a dataset to the Hugging Face Hub. Closes [4089](https://github.com/argilla-io/argilla/issues/4089) ([#4200](https://github.com/argilla-io/argilla/pull/4200))

Contributors

- Racso-3141 Added a progress bar for parsing records process to `from_huggingface()` method with `trange` in `tqdm`.([4132](https://github.com/argilla-io/argilla/pull/4132)).

1.18.0

Added

- New `GET /api/v1/datasets/:dataset_id/metadata-properties` endpoint for listing dataset metadata properties. ([3813](https://github.com/argilla-io/argilla/pull/3813))
- New `POST /api/v1/datasets/:dataset_id/metadata-properties` endpoint for creating dataset metadata properties. ([3813](https://github.com/argilla-io/argilla/pull/3813))
- New `PATCH /api/v1/metadata-properties/:metadata_property_id` endpoint allowing the update of a specific metadata property. ([3952](https://github.com/argilla-io/argilla/pull/3952))
- New `DELETE /api/v1/metadata-properties/:metadata_property_id` endpoint for deletion of a specific metadata property. ([3911](https://github.com/argilla-io/argilla/pull/3911))
- New `GET /api/v1/metadata-properties/:metadata_property_id/metrics` endpoint to compute metrics for a specific metadata property. ([3856](https://github.com/argilla-io/argilla/pull/3856))
- New `PATCH /api/v1/records/:record_id` endpoint to update a record. ([3920](https://github.com/argilla-io/argilla/pull/3920))
- New `PATCH /api/v1/dataset/:dataset_id/records` endpoint to bulk update the records of a dataset. ([3934](https://github.com/argilla-io/argilla/pull/3934))
- Missing validations to `PATCH /api/v1/questions/:question_id`. Now `title` and `description` are using the same validations used to create questions. ([3967](https://github.com/argilla-io/argilla/pull/3967))
- Added `TermsMetadataProperty`, `IntegerMetadataProperty` and `FloatMetadataProperty` classes allowing to define metadata properties for a `FeedbackDataset`. ([3818](https://github.com/argilla-io/argilla/pull/3818))
- Added `metadata_filters` to `filter_by` method in `RemoteFeedbackDataset` to filter based on metadata i.e. `TermsMetadataFilter`, `IntegerMetadataFilter`, and `FloatMetadataFilter`. ([3834](https://github.com/argilla-io/argilla/pull/3834))
- Added a validation layer for both `metadata_properties` and `metadata_filters` in their schemas and as part of the `add_records` and `filter_by` methods, respectively. ([3860](https://github.com/argilla-io/argilla/pull/3860))
- Added `sort_by` query parameter to listing records endpoints that allows to sort the records by `inserted_at`, `updated_at` or metadata property. ([3843](https://github.com/argilla-io/argilla/pull/3843))
- Added `add_metadata_property` method to both `FeedbackDataset` and `RemoteFeedbackDataset` (i.e. `FeedbackDataset` in Argilla). ([3900](https://github.com/argilla-io/argilla/pull/3900))
- Added fields `inserted_at` and `updated_at` in `RemoteResponseSchema`. ([3822](https://github.com/argilla-io/argilla/pull/3822))
- Added support for `sort_by` for `RemoteFeedbackDataset` i.e. a `FeedbackDataset` uploaded to Argilla. ([3925](https://github.com/argilla-io/argilla/pull/3925))
- Added `metadata_properties` support for both `push_to_huggingface` and `from_huggingface`. ([3947](https://github.com/argilla-io/argilla/pull/3947))
- Add support for update records (`metadata`) from Python SDK. ([3946](https://github.com/argilla-io/argilla/pull/3946))
- Added `delete_metadata_properties` method to delete metadata properties. ([3932](https://github.com/argilla-io/argilla/pull/3932))
- Added `update_metadata_properties` method to update `metadata_properties`. ([3961](https://github.com/argilla-io/argilla/pull/3961))
- Added automatic model card generation through `ArgillaTrainer.save` ([3857](https://github.com/argilla-io/argilla/pull/3857))
- Added `FeedbackDataset` `TaskTemplateMixin` for pre-defined task templates. ([3969](https://github.com/argilla-io/argilla/pull/3969))
- A maximum limit of 50 on the number of options a ranking question can accept. ([3975](https://github.com/argilla-io/argilla/pull/3975))
- New `last_activity_at` field to `FeedbackDataset` exposing when the last activity for the associated dataset occurs. ([3992](https://github.com/argilla-io/argilla/pull/3992))

Changed

- `GET /api/v1/datasets/{dataset_id}/records`, `GET /api/v1/me/datasets/{dataset_id}/records` and `POST /api/v1/me/datasets/{dataset_id}/records/search` endpoints to return the `total` number of records. ([3848](https://github.com/argilla-io/argilla/pull/3848), [#3903](https://github.com/argilla-io/argilla/pull/3903))
- Implemented `__len__` method for filtered datasets to return the number of records matching the provided filters. ([3916](https://github.com/argilla-io/argilla/pull/3916))
- Increase the default max result window for Elasticsearch created for Feedback datasets. ([3929](https://github.com/argilla-io/argilla/pull/))
- Force elastic index refresh after records creation. ([3929](https://github.com/argilla-io/argilla/pull/))
- Validate metadata fields for filtering and sorting in the Python SDK. ([3993](https://github.com/argilla-io/argilla/pull/3993))
- Using metadata property name instead of id for indexing data in search engine index. ([3994](https://github.com/argilla-io/argilla/pull/3994))

Fixed

- Fixed response schemas to allow `values` to be `None` i.e. when a record is discarded the `response.values` are set to `None`. ([3926](https://github.com/argilla-io/argilla/pull/3926))
- New Contributors
* splevine made their first contribution in https://github.com/argilla-io/argilla/pull/3832

**Full Changelog**: https://github.com/argilla-io/argilla/compare/v1.17.0...v1.18.0

1.17.0

☀️ Highlights

This release comes with a lot of new goodies and quality improvements. We added model card support for the `ArgillaTrainer`, worked on the `FeedbackDataset` task templates and added timestamps to responses. We also fixed a lot of bugs and improved the overall quality of the codebase. Enjoy!

🚨 Breaking change in updating existing Hugging Face Spaces deployments

The quickstart image startup script was changed from `from /start_quickstart.sh` to `/home/argilla/start_quickstart.sh`, which might cause existing Hugging Face Spaces deployments to malfunction. A fix was added for the Argilla template space via [this PR](https://huggingface.co/spaces/argilla/argilla-template-space/discussions/19/files). Alternatively, you can just [create a new deployment](https://huggingface.co/new-space?template=argilla%2Fargilla-template-space).

⚠️ Breaking change using SQLite as backend in a docker deployment

From version 1.17.0 a new `argilla` os user is configured for the provided docker images. If you are using the docker deployment and you want to upload to this version, you should do some actions once update your container and before working with Argilla. Execute the following command:

bash
docker exec --user root <argilla_server_container_id> /bin/bash -c 'chown -R argilla:argilla "$ARGILLA_HOME_PATH"'


This will change the permissions on the argilla home path, which allows it to work with new containers.

Note: You can find the docker container id by running:
bash
docker ps | grep -i argilla-server

bash
713973693fb7 argilla/argilla-server:v1.17.0 "/bin/bash start_arg…" 11 hours ago Up 7 minutes 0.0.0.0:6900->6900/tcp docker-argilla-1



💾 `ArgillaTrainer` Model Card Generation

The `ArgillaTrainer` now supports automatic model card generation. This means that you can now generate a model card with all the required info for Hugging Face and directly share these models to the hub, as you would expect within the Hugging Face ecosystem. See [the docs](https://docs.argilla.io/en/v1.17.0/practical_guides/fine_tune.html#model-card-generation) for more info.

python
model_card_kwargs = {
"language": ["en", "es"],
"license": "Apache-2.0",
"model_id": "all-MiniLM-L6-v2",
"dataset_name": "argilla/emotion",
"tags": ["nlp", "few-shot-learning", "argilla", "setfit"],
"model_summary": "Small summary of what the model does",
"model_description": "An extended explanation of the model",
"model_type": "A 1.3B parameter embedding model fine-tuned on an awesome dataset",
"finetuned_from": "all-MiniLM-L6-v2",
"repo": "https://github.com/..."
"developers": "",
"shared_by": "",
}

trainer = ArgillaTrainer(
dataset=dataset,
task=task,
framework="setfit",
framework_kwargs={"model_card_kwargs": model_card_kwargs}
)
trainer.train(output_dir="my_model")
or get the card as `str` by calling the `generate_model_card` method
argilla_model_card = trainer.generate_model_card("my_model")


🦮 `FeedbackDataset` Task Templates

The Argilla `FeedbackDataset` now supports a number of task templates that can be used to quickly create a dataset for specific tasks out of the box. This should help starting users get right into the action without having to worry about the dataset structure. We support basic tasks like Text Classification but also allow you to setup complex RAG-pipelines. See [the docs](https://docs.argilla.io/en/v1.17.0/practical_guides/create_dataset.html#task-templates) for more info.

python
import argilla as rg

ds = rg.FeedbackDataset.for_text_classification(
labels=["positive", "negative"],
multi_label=False,
use_markdown=True,
guidelines=None,
)
ds
FeedbackDataset(
fields=[TextField(name="text", use_markdown=True)],
questions=[LabelQuestion(name="label", labels=["positive", "negative"])]
guidelines="<Guidelines for the task>",
)


⏱️ `inserted_at` and `updated_at` are added to responses

What are responses without timestamps? The `RemoteResponseSchema` now supports `inserted_at` and `updated_at` fields. This should help you to keep track of the time when a response was created and updated. Perfectly, for keeping track of annotator performance within your company.

[1.17.0](https://github.com/argilla-io/argilla/compare/v1.16.0...v1.17.0)

Added

- Added fields `inserted_at` and `updated_at` in `RemoteResponseSchema` ([3822](https://github.com/argilla-io/argilla/pull/3822)).
- Added automatic model card generation through `ArgillaTrainer.save` ([3857](https://github.com/argilla-io/argilla/pull/3857)).
- Added task templates to the `FeedbackDataset` ([3973](https://github.com/argilla-io/argilla/pull/3973)).

Changed

- Updated `Dockerfile` to use multi stage build ([3221](https://github.com/argilla-io/argilla/pull/3221) and [#3793](https://github.com/argilla-io/argilla/pull/3793)).
- Updated active learning for text classification notebooks to use the most recent small-text version ([3831](https://github.com/argilla-io/argilla/pull/3831)).
- Changed argilla dataset name in the active learning for text classification notebooks to be consistent with the default names in the huggingface spaces ([3831](https://github.com/argilla-io/argilla/pull/3831)).
- FeedbackDataset API methods have been aligned to be accessible through the several implementations ([3937](https://github.com/argilla-io/argilla/pull/3937)).
- The `unify_responses` support for remote datasets ([3937](https://github.com/argilla-io/argilla/pull/3937)).

Fixed

- Fix field not shown in the order defined in the dataset settings. Closes [3959](https://github.com/argilla-io/argilla/issues/3959) ([#3984](https://github.com/argilla-io/argilla/pull/3984))
- Updated active learning for text classification notebooks to pass ids of type int to `TextClassificationRecord` ([3831](https://github.com/argilla-io/argilla/pull/3831)).
- Fixed record fields validation that was preventing from logging records with optional fields (i.e. `required=True`) when the field value was `None` ([3846](https://github.com/argilla-io/argilla/pull/3846)).
- Always set `pretrained_model_name_or_path` attribute as string in `ArgillaTrainer` ([3914](https://github.com/argilla-io/argilla/pull/3914)).
- The `inserted_at` and `updated_at` attributes are create using the `utcnow` factory to avoid unexpected race conditions on timestamp creation ([3945](https://github.com/argilla-io/argilla/pull/3945))
- Fixed `configure_dataset_settings` when providing the workspace via the arg `workspace` ([3887](https://github.com/argilla-io/argilla/pull/3887)).
- Fixed saving of models trained with `ArgillaTrainer` with a `peft_config` parameter ([3795](https://github.com/argilla-io/argilla/pull/3795)).
- Fixed backwards compatibility on `from_huggingface` when loading a `FeedbackDataset` from the Hugging Face Hub that was previously dumped using another version of Argilla, starting at 1.8.0, when it was first introduced ([3829](https://github.com/argilla-io/argilla/pull/3829)).
- Fixed `TrainingTaskForQuestionAnswering.__repr__` ([3969](https://github.com/argilla-io/argilla/pull/3969))
- Fixed potential dictionary key-errors in `TrainingTask.prepare_for_training_with_*`-methods ([3969](https://github.com/argilla-io/argilla/pull/3969))

Deprecated

- Function `rg.configure_dataset` is deprecated in favour of `rg.configure_dataset_settings`. The former will be removed in version 1.19.0

New Contributors
* osintalex made their first contribution in https://github.com/argilla-io/argilla/pull/3221
* kursathalat made their first contribution in https://github.com/argilla-io/argilla/pull/3756
* splevine made their first contribution in https://github.com/argilla-io/argilla/pull/3832

**Full Changelog**: https://github.com/argilla-io/argilla/compare/v1.16.0...v1.17.0

Page 5 of 22

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.