🔆 Highlights
New Feedback Task 🎉
<img width="1508" alt="snapshot-feedback-demo" src="https://github.com/argilla-io/argilla/assets/126158523/c07d0918-146f-4d1f-97b3-1db06a6d741c">
Big welcome to our new `FeedbackDataset`! This new type of dataset is designed to cover the specific needs of working with LLMs. Use this task to gather demonstration examples, human feedback, curate other datasets... Questions of different types can be combined so you can adapt your dataset to the specific needs of your project. Currently, it supports `RatingQuestion` and `TextQuestion`, but more question types will be added shortly in the coming releases.
In addition, these datasets support multiple annotations: all users with access to the dataset can give their responses.
The `FeedbackDataset` has an enhanced integration with the Hugging Face Hub, so that saving a dataset to the Hub or pushing a `FeedbackDataset` from the Hub directly to Argilla is seamless.
Check all the things you can do with Feedback Tasks [in our docs](https://docs.argilla.io/en/latest/guides/llms/conceptual_guides/conceptual_guides.html)
New LLM section in our docs
We've added a new section in our docs that covers:
- Useful concepts around work with [LLMs](https://docs.argilla.io/en/latest/guides/llms/conceptual_guides/conceptual_guides.html)
- How-to guides that cover all the functionalities of the new [Feedback Task](https://docs.argilla.io/en/latest/guides/llms/practical_guides/practical_guides.html)
- [End-to-end examples](https://docs.argilla.io/en/latest/guides/llms/examples/examples.html)
More training integrations
We've added new frameworks for the `ArgillaTrainer`: `ArgillaPeftTrainer` for Text and Token Classification and `ArgillaAutoTrainTrainer` for Text Classification.
[Changelog 1.8.0](https://github.com/argilla-io/argilla/compare/v1.7.0...v1.8.0)
Added
- `/api/v1/datasets` new endpoint to list and create datasets ([2615]).
- `/api/v1/datasets/{dataset_id}` new endpoint to get and delete datasets ([2615]).
- `/api/v1/datasets/{dataset_id}/publish` new endpoint to publish a dataset ([2615]).
- `/api/v1/datasets/{dataset_id}/questions` new endpoint to list and create dataset questions ([2615])
- `/api/v1/datasets/{dataset_id}/fields` new endpoint to list and create dataset fields ([2615])
- `/api/v1/datasets/{dataset_id}/questions/{question_id}` new endpoint to delete a dataset questions ([2615])
- `/api/v1/datasets/{dataset_id}/fields/{field_id}` new endpoint to delete a dataset field ([2615])
- `/api/v1/workspaces/{workspace_id}` new endpoint to get workspaces by id ([2615])
- `/api/v1/responses/{response_id}` new endpoint to update and delete a response ([2615])
- `/api/v1/datasets/{dataset_id}/records` new endpoint to create and list dataset records ([2615])
- `/api/v1/me/datasets` new endpoint to list user visible datasets ([2615])
- `/api/v1/me/dataset/{dataset_id}/records` new endpoint to list dataset records with user responses ([2615])
- `/api/v1/me/datasets/{dataset_id}/metrics` new endpoint to get the dataset user metrics ([2615])
- `/api/v1/me/records/{record_id}/responses` new endpoint to create record user responses ([2615])
- showing new feedback task datasets in datasets list ([2719])
- new page for feedback task ([2680])
- show feedback task metrics ([2822])
- user can delete dataset in dataset settings page ([2792])
- Support for `FeedbackDataset` in Python client (parent PR [2615], and nested PRs: [2949], [2827], [2943], [2945], [2962], and [3003])
- Integration with the HuggingFace Hub ([2949])
- Added `ArgillaPeftTrainer` for text and token classification [2854](https://github.com/argilla-io/argilla/issues/2854)
- Added `predict_proba()` method to `ArgillaSetFitTrainer`
- Added `ArgillaAutoTrainTrainer` for Text Classification [2664](https://github.com/argilla-io/argilla/issues/2664)
- New `database revisions` command showing database revisions info [2615]: https://github.com/argilla-io/argilla/issues/2615
Fixes
- Avoid rendering html for invalid html strings in Text2text ([2911]https://github.com/argilla-io/argilla/issues/2911)
Changed
- The `database migrate` command accepts a `--revision` param to provide specific revision id
- `tokens_length` metrics function returns empty data ([3045])
- `token_length` metrics function returns empty data ([3045])
- `mention_length` metrics function returns empty data ([3045])
- `entity_density` metrics function returns empty data ([3045])
Deprecated
- Using argilla with python 3.7 runtime is deprecated and support will be removed from version 1.9.0 ([2902](https://github.com/argilla-io/argilla/issues/2902))
- `tokens_length` metrics function has been deprecated and will be removed in 1.10.0 ([3045])
- `token_length` metrics function has been deprecated and will be removed in 1.10.0 ([3045])
- `mention_length` metrics function has been deprecated and will be removed in 1.10.0 ([3045])
- `entity_density` metrics function has been deprecated and will be removed in 1.10.0 ([3045])
Removed
- Removed mention `density`, `tokens_length` and `chars_length` metrics from token classification metrics storage ([3045])
- Removed token `char_start`, `char_end`, `tag`, and `score` metrics from token classification metrics storage ([3045])
- Removed tags-related metrics from token classification metrics storage ([3045])
[3045]: https://github.com/argilla-io/argilla/pull/3045
As always, thanks to our amazing contributors!
* Fix image alignment on token classification by cceyda in https://github.com/argilla-io/argilla/pull/2779
* Update cloud_providers.md by chainyo in https://github.com/argilla-io/argilla/pull/2866