🔆 Release highlights
Spans question
We've added a new type of question to Feedback Datasets: the `SpanQuestion`. This type of question allows you to highlight portions of text in a specific field and apply a label. It is specially useful for token classification (like NER or POS tagging) and information extraction tasks.
https://github.com/argilla-io/argilla/assets/126158523/d3821d49-6da0-4488-99e2-068d7411268a
With this type of question you can:
✨ Provide suggested spans with a confidence score, so your team doesn't need to start from scratch.
⌨️ Choose a label using your mouse or with the keyboard shortcut provided next to the label.
🖱️ Draw a span by dragging your mouse over the parts of the text you want to select or if it's a single token, just double-click on it.
🪄 Forget about mistakes with token boundaries. The UI will snap your spans to token boundaries for you.
🔎 Annotate at character-level when you need more fine-grained spans. Hold the `Shift` key while drawing the span and the resulting span will start and end in the exact boundaries of your selection.
✔️ Quickly change the label of a span by clicking on the label name and selecting the correct one from the dropdown.
🖍️ Correct a span at the speed of light by simply drawing the correct span over it. The new span will overwrite the old one.
🧼 Remove labels by hovering over the label name in the span and then click on the 𐢫 on the left hand side.
Here's an example of what your dataset would look like from the SDK:
python
import argilla as rg
from argilla.client.feedback.schemas import SpanValueSchema
connect to your Argilla instance
rg.init(...)
create a dataset with a span question
dataset = rg.FeedbackDataset(
fields=[rg.TextField(name="text"),
questions=[
rg.SpanQuestion(
name="entities",
title="Highlight the entities in the text:",
labels={"PER": "Person", "ORG": "Organization", "EVE": "Event"}, or ["PER", "ORG", "EVE"]
field="text", the field where you want to do the span annotation
required=True
)
]
)
create a record with suggested spans
record = rg.FeedbackRecord(
fields={"text": "This is the text of the record"}
suggestions = [
{
"question_name": "entities",
"value": [
SpanValueSchema(
start=0, position of the first character of the span
end=10, position of the character right after the end of the span
label="ORG",
score=1.0
)
],
"agent": "my_model",
}
]
)
add records to the dataset and push to Argilla
dataset.add_records([record])
dataset.push_to_argilla(...)
To learn more about this and all the other questions available in Feedback Datasets, check out our documentation on:
- [Defining questions](https://docs.argilla.io/en/latest/practical_guides/create_update_dataset/create_dataset.html#define-questions)
- [Working with suggestions and responses](https://docs.argilla.io/en/latest/practical_guides/create_update_dataset/suggestions_and_responses.html)
- [Annotating Feedback Datasets](https://docs.argilla.io/en/latest/practical_guides/annotate_dataset.html#feedback-dataset)
[Changelog 1.26.0](https://github.com/argilla-io/argilla/compare/v1.25.0...v1.26.0)
Added
- If you expand the labels of a `single or multi` label Question, the state is maintained during the entire annotation process. ([4630](https://github.com/argilla-io/argilla/pull/4630))
- Added support for span questions in the Python SDK. ([4617](https://github.com/argilla-io/argilla/pull/4617))
- Added support for span values in suggestions and responses. ([4623](https://github.com/argilla-io/argilla/pull/4623))
- Added `span` questions for `FeedbackDataset`. ([4622](https://github.com/argilla-io/argilla/pull/4622))
- Added `ARGILLA_CACHE_DIR` environment variable to configure the client cache directory. ([4509](https://github.com/argilla-io/argilla/pull/4509))
Fixed
- Fixed contextualized workspaces. ([4665](https://github.com/argilla-io/argilla/pull/4665))
- Fixed prepare for training when passing `RankingValueSchema` instances to suggestions. ([4628](https://github.com/argilla-io/argilla/pull/4628))
- Fixed parsing ranking values in suggestions from HF datasets. ([4629](https://github.com/argilla-io/argilla/pull/4629))
- Fixed reading description from API response payload. ([4632](https://github.com/argilla-io/argilla/pull/4632))
- Fixed pulling (n\*chunk_size)+1 records when using `ds.pull` or iterating over the dataset. ([4662](https://github.com/argilla-io/argilla/pull/4662))
- Fixed client's resolution of enum values when calling the Search and Metrics api, to support Python >=3.11 enum handling. ([4672](https://github.com/argilla-io/argilla/pull/4672))
New Contributors
* davidefiocco made their first contribution in https://github.com/argilla-io/argilla/pull/4639
**Full Changelog**: https://github.com/argilla-io/argilla/compare/v1.25.0...v1.26.0