[0.18.0](https://github.com/recognai/rubrix/compare/v0.17.0...v0.18.0) (2022-10-05)
⚡ Highlights
Better validation of token classification records
When working with Token Classification records, there are very often misalignment problems between the entity spans and provided tokens.
Before this release, it was difficult to understand and fix these errors because validation happened on the server side.
With this release, records are validated during instantiation, giving you a clear error message which can help you to fix/ignore problematic records.
For example, the following record:
python
import rubrix as rb
rb.TokenClassificationRecord(
tokens=["I", "love", "Paris"],
text="I love Paris!",
prediction=[("LOC",7,13)]
)
Will give you the following error message:
python
ValueError: Following entity spans are not aligned with provided tokenization
Spans:
- [Paris!] defined in ...love Paris!
Tokens:
['I', 'love', 'Paris']
Delete records by query
Now it's possible to delete specific records, either by ids or by a query using Lucene's syntax. This is useful for clean up and better dataset maintenance:
python
import rubrix as rb
Delete by id
rb.delete_records(name="example-dataset", ids=[1,3,5])
Discard records by query
rb.delete_records(name="example-dataset", query="metadata.code=33", discard_only=True)
New tutorials
We have two new tutorials!
Few-shot classification with SetFit and a custom dataset: https://rubrix.readthedocs.io/en/stable/tutorials/few-shot-classification-with-setfit.html
Analyzing predictions with model explainability methods: https://rubrix.readthedocs.io/en/stable/tutorials/nlp_model_explainability.html
https://rubrix.readthedocs.io/en/stable/tutorials/few-shot-classification-with-setfit.html
Features
* **API:** provide a dict for record annotations/predictions ([1658](https://github.com/recognai/rubrix/issues/1658)) ([12b0f83](https://github.com/recognai/rubrix/commit/12b0f83ba584231de6bd2c2f775b8dcaf7b88dcc))
* **Client:** expose client extra headers in init function ([1715](https://github.com/recognai/rubrix/issues/1715)) ([79f0529](https://github.com/recognai/rubrix/commit/79f05298c408e6f0861408e2631423bff0860f01)), closes [#1706](https://github.com/recognai/rubrix/issues/1706)
* **Client:** improve httpx errors handling ([1662](https://github.com/recognai/rubrix/issues/1662)) ([85da336](https://github.com/recognai/rubrix/commit/85da336925f39a84df9577c72ce0ae7d508ca50f))
* **Client:** validate token classification annotations in client ([1709](https://github.com/recognai/rubrix/issues/1709)) ([936d1ca](https://github.com/recognai/rubrix/commit/936d1ca3e39d7df5516fc805f9501526e8a5f999)), closes [#1579](https://github.com/recognai/rubrix/issues/1579)
* **Datasets:** delete records by query ([1721](https://github.com/recognai/rubrix/issues/1721)) ([bc9685d](https://github.com/recognai/rubrix/commit/bc9685dbba092c8f29960fc79c2099febbe6b782)), closes [#1714](https://github.com/recognai/rubrix/issues/1714) [#1737](https://github.com/recognai/rubrix/issues/1737)
* **Datasets:** restrict dataset deletion only to creators and super-users ([1713](https://github.com/recognai/rubrix/issues/1713)) ([c1bef9d](https://github.com/recognai/rubrix/commit/c1bef9d0726790e9c981aa906a7c5ba8a24d6521)), closes [#1740](https://github.com/recognai/rubrix/issues/1740)
* **Server:** Add server telemetry ([1687](https://github.com/recognai/rubrix/issues/1687)) ([d7cc006](https://github.com/recognai/rubrix/commit/d7cc0064a6896c8bb4accc3a51d51fbab0cb3c77))
Bug Fixes
* 'MajorityVoter.score' when using multi-labels ([1678](https://github.com/recognai/rubrix/issues/1678)) ([0b94c86](https://github.com/recognai/rubrix/commit/0b94c868131f0fc0c366a28150f06766f8686fcd)), closes [#1628](https://github.com/recognai/rubrix/issues/1628)
* **Metadata limits:** exclude subfields from mappings ([1700](https://github.com/recognai/rubrix/issues/1700)) ([9f9650e](https://github.com/recognai/rubrix/commit/9f9650eb80a11281c0ea73606c39d6b066697f22)), closes [#1699](https://github.com/recognai/rubrix/issues/1699)
* Normalizes the UnauthorizationError for the API response ([1748](https://github.com/recognai/rubrix/issues/1748)) ([6a68048](https://github.com/recognai/rubrix/commit/6a68048b3742c17ef2f02c9b447f9266f9ef1428))
* Search tag reset prior annotation ([1736](https://github.com/recognai/rubrix/issues/1736)) ([dc0a17f](https://github.com/recognai/rubrix/commit/dc0a17fa9da80c5342cc283e841ba997aa36c9a9)), closes [#1711](https://github.com/recognai/rubrix/issues/1711)
Visual enhancements
* Align App UI with the design system ([1672](https://github.com/recognai/rubrix/issues/1672)) ([67d6de8](https://github.com/recognai/rubrix/commit/67d6de818ba1b092ec632fef74c33730df11e597)), closes [#1670](https://github.com/recognai/rubrix/issues/1670)
Documentation
* Add interpret tutorial with Transformers ([1728](https://github.com/recognai/rubrix/issues/1728)) ([c3fa079](https://github.com/recognai/rubrix/commit/c3fa079a94fe1b1515dd609e38facbc185cff38c)), closes [#1729](https://github.com/recognai/rubrix/issues/1729)
* Adds tutorial about custom few-shot classification with SetFit ([1739](https://github.com/recognai/rubrix/issues/1739)) ([4f15ee6](https://github.com/recognai/rubrix/commit/4f15ee656e199bf8ad3093939617765df23f6fdc)), closes [#1741](https://github.com/recognai/rubrix/issues/1741)
* fixing the active learning tutorial with `small-text` ([1726](https://github.com/recognai/rubrix/issues/1726)) ([909efdf](https://github.com/recognai/rubrix/commit/909efdfd07086dc1387d9957fcbb4489dbd4ae51)), closes [#1693](https://github.com/recognai/rubrix/issues/1693)
* raise small-text version to 1.1.0 and adapt tutorial ([1744](https://github.com/recognai/rubrix/issues/1744)) ([16f19b7](https://github.com/recognai/rubrix/commit/16f19b7dbb104eba492aea3cf17b15a3abefbeaa)), closes [#1693](https://github.com/recognai/rubrix/issues/1693)
* Resolve many typos in documentation, comments and tutorials ([1701](https://github.com/recognai/rubrix/issues/1701)) ([f05e1c1](https://github.com/recognai/rubrix/commit/f05e1c1059bee8bee7fc9abf03c4b196e890d85f))
* using official token class. mapper since is compatible now ([1738](https://github.com/recognai/rubrix/issues/1738)) ([e82fd13](https://github.com/recognai/rubrix/commit/e82fd139348f8a6b94a02cf195eeb33e9987281c)), closes [#482](https://github.com/recognai/rubrix/issues/482)
As always, thanks to our amazing contributors!
- refactor: accept flat text as input for token classification mapper (1686) by Ankush-Chander
- feat(Client): improve httpx errors handling (1662) by Ankush-Chander
- fix: 'MajorityVoter.score' when using multi-labels (1678) by dcfidalgo
- docs: raise small-text version to 1.1.0 and adapt tutorial (1744) by chschroeder
- refactor: Incompatible attribute type fixed (1675) by luca-digrazia
- docs: Resolve many typos in documentation, comments and tutorials (1701) by tomaarsen
- refactor: Collection of changes, primarily regarding test suite and its coverage (1702) by tomaarsen