[0.16.0](https://github.com/recognai/rubrix/compare/v0.15.0...v0.16.0) (2022-07-08)
Highlights
👂 Listeners: enable more interactive workflows between client and server
Listeners enable you to define functions that get executed under certain conditions when something changes in a dataset. There are many use cases for this: monitoring annotation jobs, monitoring model predictions, enabling active learning workflows, and many more.
You can find the Python API reference docs here: https://rubrix.readthedocs.io/en/stable/reference/python/python_listeners.html#python-listeners
We will be documenting these use cases with practical examples, but for this release, we've included a new tutorial for using this with active learning: https://rubrix.readthedocs.io/en/stable/tutorials/active_learning_with_small_text.html. This tutorial includes the following listener function, which implements the active learning loop:
python
from rubrix.listeners import listener
from sklearn.metrics import accuracy_score
Define some helper variables
LABEL2INT = trec["train"].features["label-coarse"].str2int
ACCURACIES = []
Set up the active learning loop with the listener decorator
listener(
dataset=DATASET_NAME,
query="status:Validated AND metadata.batch_id:{batch_id}",
condition=lambda search: search.total==NUM_SAMPLES,
execution_interval_in_seconds=3,
batch_id=0
)
def active_learning_loop(records, ctx):
1. Update active learner
print(f"Updating with batch_id {ctx.query_params['batch_id']} ...")
y = np.array([LABEL2INT(rec.annotation) for rec in records])
initial update
if ctx.query_params["batch_id"] == 0:
indices = np.array([rec.id for rec in records])
active_learner.initialize_data(indices, y)
update with the prior queried indices
else:
active_learner.update(y)
print("Done!")
2. Query active learner
print("Querying new data points ...")
queried_indices = active_learner.query(num_samples=NUM_SAMPLES)
ctx.query_params["batch_id"] += 1
new_records = [
rb.TextClassificationRecord(
text=trec["train"]["text"][idx],
metadata={"batch_id": ctx.query_params["batch_id"]},
id=idx,
)
for idx in queried_indices
]
3. Log the batch to Rubrix
rb.log(new_records, DATASET_NAME)
4. Evaluate current classifier on the test set
print("Evaluating current classifier ...")
accuracy = accuracy_score(
dataset_test.y,
active_learner.classifier.predict(dataset_test),
)
ACCURACIES.append(accuracy)
print("Done!")
print("Waiting for annotations ...")
📖 New docs!
https://rubrix.readthedocs.io/
<img width="1643" alt="Screenshot 2022-07-13 at 12 49 42" src="https://user-images.githubusercontent.com/1107111/178716820-f675ec48-486f-4763-bd48-60e5e7d773da.png">
🧱 `extend_matrix`: Weak label augmentation using embeddings
This release includes an exciting feature to augment the coverage of your weak labels using embeddings. You can find a practical tutorial here: https://rubrix.readthedocs.io/en/stable/tutorials/extend_weak_labels_with_embeddings.html
Features
* **1561:** standardize icons ([1565](https://github.com/recognai/rubrix/issues/1565)) ([15254e7](https://github.com/recognai/rubrix/commit/15254e73c1cec4b9f3f2bb940a89e72d66da78e5)), closes [#1561](https://github.com/recognai/rubrix/issues/1561)
* **1602:** new rubrix dataset listeners ([1507](https://github.com/recognai/rubrix/issues/1507), [#1586](https://github.com/recognai/rubrix/issues/1586), [#1583](https://github.com/recognai/rubrix/issues/1583), [#1596](https://github.com/recognai/rubrix/issues/1596)) ([65747ab](https://github.com/recognai/rubrix/commit/65747abcde1283356465cfc9836bd600ff354535)), closes [#1602](https://github.com/recognai/rubrix/issues/1602)
* Add 'extend_matrix' to the WeakMultiLabel class ([1577](https://github.com/recognai/rubrix/issues/1577)) ([cf89311](https://github.com/recognai/rubrix/commit/cf89311473c5446b7e01baf9429e0b673e3cf5a1))
* Improve from datasets ([1567](https://github.com/recognai/rubrix/issues/1567)) ([2b0d607](https://github.com/recognai/rubrix/commit/2b0d6075ec3f4eb2cf2783583dd21d4f4a0d5c4f))
* **token-class:** adjust token spans spaces ([1599](https://github.com/recognai/rubrix/issues/1599)) ([0fb3576](https://github.com/recognai/rubrix/commit/0fb3576e6ade30cc7dbbb9d6af947fa3f85ea4c0))
Bug Fixes
* **1264:** discard first space after a token ([1591](https://github.com/recognai/rubrix/issues/1591)) ([eff0ac5](https://github.com/recognai/rubrix/commit/eff0ac5b0e2f7198e695ede905737497bba451cf)), closes [#1264](https://github.com/recognai/rubrix/issues/1264)
* **1545:** highlight words with accents ([1550](https://github.com/recognai/rubrix/issues/1550)) ([c42e77b](https://github.com/recognai/rubrix/commit/c42e77be021e57ba6b15074f457e99d4d06f0a33)), closes [#1545](https://github.com/recognai/rubrix/issues/1545)
* **1548:** access datasets for superusers when workspace is not provided ([1572](https://github.com/recognai/rubrix/issues/1572), [#1608](https://github.com/recognai/rubrix/issues/1608)) ([0b04bc8](https://github.com/recognai/rubrix/commit/0b04bc8920b78e346cb6fef8fa650fc485e54819)), closes [#1548](https://github.com/recognai/rubrix/issues/1548)
* **1551:** don't show error traces for EntityNotFoundError's ([1569](https://github.com/recognai/rubrix/issues/1569)) ([04e101c](https://github.com/recognai/rubrix/commit/04e101c36e00c87d32359ca0df7c92b2cf9ed55c)), closes [#1551](https://github.com/recognai/rubrix/issues/1551)
* **1557:** allow text editing when clicking the "edit" button ([1558](https://github.com/recognai/rubrix/issues/1558)) ([e751414](https://github.com/recognai/rubrix/commit/e7514149be3632062dd755e79c099a0f091d70df)), closes [#1557](https://github.com/recognai/rubrix/issues/1557)
* **1574:** search highlighting for a single dot ([1592](https://github.com/recognai/rubrix/issues/1592)) ([53474a1](https://github.com/recognai/rubrix/commit/53474a1db9fd9a92d263988169833af0507f6ffe)), closes [#1574](https://github.com/recognai/rubrix/issues/1574)
* **1575:** show predicted ok/ko in Text Classifier explore mode ([1576](https://github.com/recognai/rubrix/issues/1576)) ([ada87c0](https://github.com/recognai/rubrix/commit/ada87c07d0a603fff56f61ff1c321434ce028791)), closes [#1575](https://github.com/recognai/rubrix/issues/1575)
* compatibility with new dataset version ([1566](https://github.com/recognai/rubrix/issues/1566)) ([ac26e30](https://github.com/recognai/rubrix/commit/ac26e301a636d193ed5036dfa31370c29e2f1462))
Documentation
* **1512:** change theme to furo ([1564](https://github.com/recognai/rubrix/issues/1564), [#1604](https://github.com/recognai/rubrix/issues/1604)) ([98869d2](https://github.com/recognai/rubrix/commit/98869d20efcff27c0c884fe76f5f32cc2a1bfe35)), closes [#1512](https://github.com/recognai/rubrix/issues/1512)
* add 'how to prepare your data for training' to basics ([1589](https://github.com/recognai/rubrix/issues/1589)) ([a21bcf3](https://github.com/recognai/rubrix/commit/a21bcf3e1a89e74e3ce4db0f66a7854aa4a41e7c))
* add active learning with small text and listener tutorial ([1585](https://github.com/recognai/rubrix/issues/1585), [#1609](https://github.com/recognai/rubrix/issues/1609)) ([d59573f](https://github.com/recognai/rubrix/commit/d59573fefa46be55159b4f08fdfa92ee75b76973)), closes [#1601](https://github.com/recognai/rubrix/issues/1601) [#421](https://github.com/recognai/rubrix/issues/421)
* Add MajorityVoter to references + Add comments about multi-label support of the label models ([1582](https://github.com/recognai/rubrix/issues/1582)) ([ab481c7](https://github.com/recognai/rubrix/commit/ab481c77551e00d5f11bec51f48f1d1d1adda6a0))
* add pip version and dockertag as parameter in the build process ([1560](https://github.com/recognai/rubrix/issues/1560)) ([73a31e2](https://github.com/recognai/rubrix/commit/73a31e26d50883bc7ece90f287e64295ba0c17ee))
You can see all work included in the release here
- chore(docs): remove by frascuchon
- docs: add active learning with small text and listener tutorial (1585, 1609) by dcfidalgo
- docs(1512): change theme to furo (1564, 1604) by frascuchon
- chore: set version by frascuchon
- feat(token-class): adjust token spans spaces (1599) by frascuchon
- feat(1602): new rubrix dataset listeners (1507, 1586, 1583, 1596) by frascuchon
- docs: add 'how to prepare your data for training' to basics (1589) by dcfidalgo
- test: configure numpy to disable multi threading (1593) by frascuchon
- docs: Add MajorityVoter to references + Add comments about multi-label support of the label models (1582) by dcfidalgo
- feat(1561): standardize icons (1565) by leiyre
- Feat: Improve from datasets (1567) by dcfidalgo
- feat: Add 'extend_matrix' to the WeakMultiLabel class (1577) by dcfidalgo
- docs: add pip version and dockertag as parameter in the build process (1560) by frascuchon
- refactor: remove `words` references in searches (1571) by frascuchon
- ci: check conda env cache (1570) by frascuchon
- fix(1264): discard first space after a token (1591) by frascuchon
- ci(package): regenerate view snapshot (1600) by frascuchon
- fix(1574): search highlighting for a single dot (1592) by leiyre
- fix(1575): show predicted ok/ko in Text Classifier explore mode (1576) by leiyre
- fix(1548): access datasets for superusers when workspace is not provided (1572, 1608) by frascuchon
- fix(1551): don't show error traces for EntityNotFoundError's (1569) by frascuchon
- fix: compatibility with new dataset version (1566) by dcfidalgo
- fix(1557): allow text editing when clicking the "edit" button (1558) by leiyre
- fix(1545): highlight words with accents (1550) by leiyre