🔆 Highlights
Rubrix Workspaces
Rubrix Workspaces enable you to organize your data collection and monitoring workflows much more flexibly than before. Workspaces can be project-based (for separating the work across different projects), team-based (for organizing the work across teams), model-based (for organizing data collection and monitoring on a per-model or model group basis), or anything you can think about. A workspace is a Rubrix “space” where users can collaborate, both using the Webapp and the Python client. There are two types of workspace:
`Team workspace`: Where one or several users have read/write access.
`User workspace`: Every user gets its own user workspace. This workspace is the default workspace when users log in and log and load data with the Python client. The name of this workspace corresponds to the username.
Additionally, you can still use `tags` and `metadata` to structure datasets inside a workspace.
The setup should be pretty straight forward, you can find all details here: https://rubrix.readthedocs.io/en/stable/getting_started/user-management.html.
From the Python library side, to know how to log and load data from different workspaces, check the Python client API docs: https://rubrix.readthedocs.io/en/stable/reference/python/python_client.html
![Kapture 2021-11-30 at 16 06 57](https://user-images.githubusercontent.com/1107111/144072657-02529a9a-d360-4e90-bf98-e7674dc00d4d.gif)
Weak Supervision
1. Implementation of the first built-in Label Model (Snorkel): https://rubrix.readthedocs.io/en/stable/guides/weak-supervision.html#Built-in-label-models
2. New tutorial using weak supervision for news classification: https://rubrix.readthedocs.io/en/stable/tutorials/weak-supervision-with-rubrix.html
3. Example using Weasel for training a downstream classifier directly with weak labels using PyTorch and Hugging Face transformers: https://rubrix.readthedocs.io/en/stable/guides/weak-supervision.html#Joint-Model-with-Weasel
The API docs for the weak supervision model can be found here: https://rubrix.readthedocs.io/en/stable/reference/python/python_labeling.html#python-labeling
Improved UX for text classification annotation
Refined the annotation module for text classification, especially for text classification with a high number of labels
![Kapture 2021-11-30 at 16 02 52](https://user-images.githubusercontent.com/1107111/144071943-9f9c6dcf-8b03-4534-b302-0cb981aef452.gif)
Rubrix Metrics
Increased the support for Rubrix Metrics, check this guide for more information: https://rubrix.readthedocs.io/en/stable/guides/metrics.html
* Support for queries to compute metrics for dataset slices
* Support for F1 in Token Classification
* Support for common metrics across tasks (string length)
* Support for Token classification predictions (model outputs) and annotations (training data)
💻 Upgrading
To use this new release, do not forget to run:
Update the client library:
bash
pip install -U rubrix
If you are using Docker:
bash
docker-compose pull
docker-compose up
If you are using the python server:
bash
pip install -U rubrix[server]
What's Changed
- Refactor: Move `RubrixClient` out of __init__ (563) by David Fidalgo
- Remove dynamic metadata, move it to `setup.cfg` (562) by David Fidalgo
- fix tab titles of our docs (561) by David Fidalgo
- [UI] Token classifier: Arrow styles are broken in Firefox (576) by leiyre
- Fix: `rb.load` for ids with mixed types (577) by David Fidalgo
- fix the build process (583) by David Fidalgo
- fix: limit agent length (585) by Francisco Aranda
- refactor(client): moves asgi module to rubrix.monitoring (584) by Francisco Aranda
- fix(client): clear client cache after delete dataset (580) by Francisco Aranda
- fixes(server): avoid mix single and multi label records for text-class (582) by Francisco Aranda
- fix: assert tokens and text have content (598) by Francisco Aranda
- tests: include basic tests for server.security module (593) by Alex Jakubko
- [Docs] Make building the docs faster (599) by David Fidalgo
- feat(client): compute metrics with query filter (600) by Francisco Aranda
- refactor(server): normalizes token classification metrics (602) by Francisco Aranda
- bugfixes(metrics): prevent index out of range for tokenclass metrics (608) by Francisco Aranda
- feat(metrics): use stacked bar for entity consistency (607) by Francisco Aranda
- fix(UI): Mention values in Stats sidebar sort when updating (613) by leiyre
- Add tqdm to `rb.log` (609) by David Fidalgo
- feat(metrics): include mention length metrics at char level (615) by Francisco Aranda
- fix(monitoring): support old zeroshot versions (614) by Francisco Aranda
- fix: enable nested fields in search dsl (587) by Francisco Aranda
- hotfix: fix test for build ci by Francisco Aranda
- Typo fix in 05-active_learning.ipynb (619) by Sebastian Raschka
- feat(metrics): annotated mentions metrics (618) by Francisco Aranda
- [UI] Text classifier: annotation task interaction enhancement (611) by leiyre
- docs: Introduce monitoring guide (625) by Daniel Vila Suero
- docs: review monitoring guide (626) by Daniel Vila Suero
- refactor: rename teams to workspaces (622) by Francisco Aranda
- docs: update monitoring guide (631) by Daniel Vila Suero
- fix(client): Adds verbose kwarg to rb.log (632) by David Fidalgo
- [stats] Keywords in stats re-sort when query is updated (639) by leiyre
- hotfix(server): wrong email user validation regex by Francisco Aranda
- Introduce LabelModel and Snorkel implementation (624) by David Fidalgo
- refactor(UI): normalize multi-label dataset access (635) by Francisco Aranda
- [QA] text classification labels (636) by leiyre
- fix(metrix): empty metrics visualization (642) by Francisco Aranda
- Add F1 metrics to token classification task (640) by David Fidalgo
- fix(doc): prevent 'Mixed Content:...' error (645) by Francisco Aranda
- NoRecordsFoundError when rb.load results in empty list in WeakLabels (641) by David Fidalgo
- [UI styles] QA annotation buttons styles (654) by leiyre
- refactor(metrics): module shortcut for compute_for and enum def (651) by Francisco Aranda
- hotfix(user): empty workspaces list checks to default workspace by Francisco Aranda
- format doc strings according to the google style + small improvements (656) by David Fidalgo
- fix(search): prevent ignore 0s for aggregation result keys (655) by Francisco Aranda
- feat(server): accepts workspace as http header (659) by Francisco Aranda
- refactor(user): bypass ws for super users (660) by Francisco Aranda
- feat(server): common task metrics (657) by Francisco Aranda
- feat(client): user workspace management from client (661) by Francisco Aranda
- feat(UI): select user workspace (662) by Francisco Aranda
- UI: Add hover effect on selected label in Text Classification (663) by leiyre
- UI: Button-icon active state improvement (664) by leiyre
- [BUG] Annotation agent is user.username (666) by leiyre
- by default do not pass on Y_dev when fitting (670) by David Fidalgo
- Docs: Adds weak supervision tutorial (672) by Daniel Vila Suero
- [Client] Add metrics parameter to all client models (671) by David Fidalgo
- [UI] QA: button active state color duration (675) by leiyre
- [bug] Sticky top-bar glitch when scrolling (674) by leiyre
- fix(docs): .rubrix_* -> .rubrix* (680) by Francisco Aranda
- fix(server): metadata keys with empty meta will be omitted (678) by Francisco Aranda
- docs: fix small typo in ws tuto (684) by Daniel Vila Suero
- feat(client): dataset copy with workspace param (683) by Francisco Aranda
- [UI] Limit pagination in UI (668) by leiyre
- fix(server): single label annotation validator (687) by Francisco Aranda
- fix(app): read all dataset labels for annotation (688) by Francisco Aranda
- [UI] Message for empty home (datasets list) (691) by leiyre
- [UI] Fix: Text classifier explore record width (696) by leiyre
- [Labeling] Throw error when encountering duplicated rule names (693) by David Fidalgo
- [UI] Fix: Text Classification annotation record width (699) by leiyre
- [Metrics] Normalize F1 metrics for Text-/TokenClassification (694) by David Fidalgo
- fix link for models (703) by Leire Rosado
- [Docs] First attempt to devise a testing workflow for the tutorials (649) by David Fidalgo
- docs: Updates metrics guide (647) by Daniel Vila Suero
- [UI] "Validate" button align left in Text classification and Token classification (707) by leiyre
- feat(metrics): improve common dataset metrics 709 by Francisco Aranda
- [Docs] Add WeaSEL example to weak supervision guide (578) by David Fidalgo
- [UI] Workspaces QA (697) by leiyre
- small typo/grammar fixes for the weak supervision guide by dcfidalgo
- Fix/loss tutorial (714) by Leire Rosado
- Fix/spacy_transformers (711) by Leire Rosado
- fix(ui): refresh dataset before initalize it (721) by Francisco Aranda
- [UI] Fix: Refresh button mantains pagination configuration (715) by leiyre
- remove kglab tutorial (720) by David Fidalgo
- fix(ui): refresh aggregations to paginated dataset (722) by Francisco Aranda
- fix(ui): preserving the annotate/explore state on browser refresh (724) by Francisco Aranda
- docs: Adds User and Workspaces management guide (726) by Daniel Vila Suero
New Contributors
* rasbt made their first contribution in https://github.com/recognai/rubrix/pull/619
* leireropl made their first contribution in https://github.com/recognai/rubrix/pull/703
**Full Changelog**: https://github.com/recognai/rubrix/compare/v0.6.2...v0.7.0