Rubrix

Latest version: v0.19.0

Safety actively analyzes 666166 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 5

0.15.0

[0.15.0](https://github.com/recognai/rubrix/compare/v0.14.2...v0.15.0) (2022-06-08)

🔆 Highlights

🏷️ Configure datasets with a labeling scheme

You can now predefine and change the label schema of your datasets. This is useful for fixing a set of labels for you and your annotation teams.

python
import rubrix as rb

Define labeling schema
settings = rb.TextClassificationSettings(label_schema=["A", "B", "C"])

Apply seetings to a new or already existing dataset
rb.configure_dataset(name="my_dataset", settings=settings)

Logging to the newly created dataset triggers the validation checks
rb.log(rb.TextClassificationRecord(text="text", annotation="D"), "my_dataset")
BadRequestApiError: Rubrix server returned an error with http status: 400


Read the docs: https://rubrix.readthedocs.io/en/stable/guides/dataset_settings.html

🧱 Weak label matrix augmentation using embeddings
You can now use an augmentation technique inspired by https://github.com/HazyResearch/epoxy to augment the coverage of your rules using embeddings (e.g., sentence transformers). This is useful for improving the recall of your labeling rules.

Read the tutorial: https://rubrix.readthedocs.io/en/stable/tutorials/extend_weak_labels_with_embeddings.html

🏛️ Tutorial Gallery
Tutorials are now organized into different categories and with a new gallery design!

Read the docs: https://rubrix.readthedocs.io/en/stable/tutorials/introductory.html

🏁 Basics guide
This is the first version of the basics guide. This guide will show you how to perform the most basic actions with Rubrix, such as uploading data or data annotation.

Read the docs: https://rubrix.readthedocs.io/en/stable/getting_started/basics.html

Features

* **1134:** Allow extending the weak label matrix with embeddings ([1487](https://github.com/recognai/rubrix/issues/1487)) ([4d54994](https://github.com/recognai/rubrix/commit/4d54994d45bc5487ee70ef77967f064c5ee65212)), closes [#1134](https://github.com/recognai/rubrix/issues/1134)
* **1432:** configure datasets with a label schema ([21e48c0](https://github.com/recognai/rubrix/commit/21e48c007e4e71b02c2d4fe205b5a1587e0f8829)), closes [#1432](https://github.com/recognai/rubrix/issues/1432)
* **1446:** copy icon position in datasets list ([1448](https://github.com/recognai/rubrix/issues/1448)) ([7c9fa52](https://github.com/recognai/rubrix/commit/7c9fa5282c18599a492c567631e53b6fa8a8f55d)), closes [#1446](https://github.com/recognai/rubrix/issues/1446)
* **1460:** include text hyphenation ([1469](https://github.com/recognai/rubrix/issues/1469)) ([ec23b2d](https://github.com/recognai/rubrix/commit/ec23b2d7fd7d28ee9ad81adc7ab775d7414edb4b)), closes [#1460](https://github.com/recognai/rubrix/issues/1460)
* **1463:** change icon position in table header ([1473](https://github.com/recognai/rubrix/issues/1473)) ([5172324](https://github.com/recognai/rubrix/commit/5172324613b50b340184e758587f76c4bd6b71e8)), closes [#1463](https://github.com/recognai/rubrix/issues/1463)
* **1467:** include animation delay for last progress bar track ([1462](https://github.com/recognai/rubrix/issues/1462)) ([c772b74](https://github.com/recognai/rubrix/commit/c772b7409f902a78c614b3a8cb7bae16994b9410)), closes [#1467](https://github.com/recognai/rubrix/issues/1467)
* **configuraton:** add elasticsearch ca_cert path variable ([1502](https://github.com/recognai/rubrix/issues/1502)) ([f0eda12](https://github.com/recognai/rubrix/commit/f0eda124777ea6ac98c9c37d5727dd814f0d981b))
* **UI:** improve access to actions in metadata and sort dropdowns ([1510](https://github.com/recognai/rubrix/issues/1510)) ([8d33090](https://github.com/recognai/rubrix/commit/8d33090c043f22035cd00c80b97f9d618741181c)), closes [#1435](https://github.com/recognai/rubrix/issues/1435)


Bug Fixes

* **1522:** dates metadata fields accessible for sorting ([1529](https://github.com/recognai/rubrix/issues/1529)) ([a576ceb](https://github.com/recognai/rubrix/commit/a576ceb9dab1475bee97c45ea2d3f0b7c8883849)), closes [#1522](https://github.com/recognai/rubrix/issues/1522)
* **1527:** check agents instead labels for `predicted` computation ([1528](https://github.com/recognai/rubrix/issues/1528)) ([2f2ee2e](https://github.com/recognai/rubrix/commit/2f2ee2edfca6988c7ec9c48aa1a1a15b0793c39d)), closes [#1527](https://github.com/recognai/rubrix/issues/1527)
* **1532:** correct domain for filter score histogram ([1540](https://github.com/recognai/rubrix/issues/1540)) ([7478d6c](https://github.com/recognai/rubrix/commit/7478d6c09ac35cb911b443df2866615182e594a9)), closes [#1532](https://github.com/recognai/rubrix/issues/1532)
* **1533:** restrict highlighted fields ([3a8b8a9](https://github.com/recognai/rubrix/commit/3a8b8a9c743ee336bcc7b0e80ee66394958c70bd)), closes [#1533](https://github.com/recognai/rubrix/issues/1533)
* **1534:** fix progress in the metrics sidebar when page is refreshed ([1536](https://github.com/recognai/rubrix/issues/1536)) ([1b572c4](https://github.com/recognai/rubrix/commit/1b572c44f8bbe577f7b3b56d4176a20595a67158))
* **1539:** checkbox behavior with value 0 ([1541](https://github.com/recognai/rubrix/issues/1541)) ([7a0ab63](https://github.com/recognai/rubrix/commit/7a0ab639a90d5da32a026284ad461de808ccd00c)), closes [#1539](https://github.com/recognai/rubrix/issues/1539)
* **metrics:** compute f1 for text classification ([1530](https://github.com/recognai/rubrix/issues/1530)) ([147d38a](https://github.com/recognai/rubrix/commit/147d38a983784b2501786696e31b45130e41fe9b))
* **search:** highlight only textual input fields ([8b83a82](https://github.com/recognai/rubrix/commit/8b83a82b5e8906fff598b4cfe740179ffe2ef680)), closes [#1538](https://github.com/recognai/rubrix/issues/1538) [#1544](https://github.com/recognai/rubrix/issues/1544)

New contributors

RafaelBod made his first contribution in https://github.com/recognai/rubrix/pull/1413

0.14.2

[0.14.2](https://github.com/recognai/rubrix/compare/v0.14.1...v0.14.2) (2022-05-31)


Bug Fixes

* **1514:** allow ent score `None` and change default value to 0.0 ([1521](https://github.com/recognai/rubrix/issues/1521)) ([0a02c70](https://github.com/recognai/rubrix/commit/0a02c70e0a51e543e0c7d317ae2e397b084f44af)), closes [#1514](https://github.com/recognai/rubrix/issues/1514)
* **1516:** restore read-only to copied dataset ([1520](https://github.com/recognai/rubrix/issues/1520)) ([5b9cf0e](https://github.com/recognai/rubrix/commit/5b9cf0ef5198f7b7dc6a8525bafc09716c08d5b0)), closes [#1516](https://github.com/recognai/rubrix/issues/1516)
* **1517:** stop background task when something happens to main thread ([1519](https://github.com/recognai/rubrix/issues/1519)) ([0304f40](https://github.com/recognai/rubrix/commit/0304f4087ff703f4655f3351f0640a61cfde0f5e)), closes [#1517](https://github.com/recognai/rubrix/issues/1517)
* **1518:** disable global actions checkbox when no data was found ([1525](https://github.com/recognai/rubrix/issues/1525)) ([bf35e72](https://github.com/recognai/rubrix/commit/bf35e725f61371a60f234d2e836cdd169023ed7e)), closes [#1518](https://github.com/recognai/rubrix/issues/1518)
* **UI:** remove selected metadata fields for sortable fields dropdown ([1513](https://github.com/recognai/rubrix/issues/1513)) ([bb9482b](https://github.com/recognai/rubrix/commit/bb9482b240b4eb02f1cd66bf19c8f4025dc45eb2))

0.14.1

[0.14.1](https://github.com/recognai/rubrix/compare/v0.14.0...v0.14.1) (2022-05-20)


Bug Fixes

* **1447:** change agent when validating records with annotation but default status ([1480](https://github.com/recognai/rubrix/issues/1480)) ([126e6f4](https://github.com/recognai/rubrix/commit/126e6f4d3ba53ce2bef5908c82174fa20b20551e)), closes [#1447](https://github.com/recognai/rubrix/issues/1447)
* **1472:** hide scrollbar in scrollable components ([1490](https://github.com/recognai/rubrix/issues/1490)) ([b056e4e](https://github.com/recognai/rubrix/commit/b056e4e595a2d4585f67c19049c2250a432b8b36)), closes [#1472](https://github.com/recognai/rubrix/issues/1472)
* **1483:** close global actions "Annotate as" selector after deselect records checkbox ([1485](https://github.com/recognai/rubrix/issues/1485)) ([a88f8cb](https://github.com/recognai/rubrix/commit/a88f8cb4942d4ddcd316f5fd59afa18b65498e24))
* **1503:** Count filter values when loading a dataset with a route query ([1506](https://github.com/recognai/rubrix/issues/1506)) ([43be9b8](https://github.com/recognai/rubrix/commit/43be9b87f99ead9e298448d3cf1a791cce9d0fc7)), closes [#1503](https://github.com/recognai/rubrix/issues/1503)
* **documentation:** fix user management guide ([1511](https://github.com/recognai/rubrix/issues/1511)) ([63f7bee](https://github.com/recognai/rubrix/commit/63f7bee24272bfd784985fac5b5b2e83588bd414)), closes [#1501](https://github.com/recognai/rubrix/issues/1501)
* **filters:** sort filter values by count ([1488](https://github.com/recognai/rubrix/issues/1488)) ([0987167](https://github.com/recognai/rubrix/commit/09871673e0196cdc992f6e1691e4a03bb75e3125)), closes [#1484](https://github.com/recognai/rubrix/issues/1484)

0.14.0

[0.14.0](https://github.com/recognai/rubrix/compare/v0.13.3...v0.14.0) (2022-05-10)

Async version of `rb.log`

You can now use the parameter `background` in the `rb.log` method to log records without blocking the main process. The main use case is monitoring production pipelines to do prediction monitoring. Here's an example with BentoML (you can find the full example in the updated [Monitoring guide](https://rubrix.readthedocs.io/en/v0.14.0/guides/monitoring.html#Using-rb.log-in-background-mode)):

python
from bentoml import BentoService, api, artifacts, env
from bentoml.adapters import JsonInput
from bentoml.frameworks.spacy import SpacyModelArtifact

import rubrix as rb

import spacy

nlp = spacy.load("en_core_web_sm")


env(infer_pip_packages=True)
artifacts([SpacyModelArtifact("nlp")])
class SpacyNERService(BentoService):

api(input=JsonInput(), batch=True)
def predict(self, parsed_json_list):
result, rb_records = ([], [])
for index, parsed_json in enumerate(parsed_json_list):
doc = self.artifacts.nlp(parsed_json["text"])
prediction = [{"entity": ent.text, "label": ent.label_} for ent in doc.ents]
rb_records.append(
rb.TokenClassificationRecord(
text=doc.text,
tokens=[t.text for t in doc],
prediction=[
(ent.label_, ent.start_char, ent.end_char) for ent in doc.ents
],
)
)
result.append(prediction)

rb.log(
name="monitor-for-spacy-ner",
records=rb_records,
tags={"framework": "bentoml"},
background=True,
verbose=False
) By using the background=True, the model latency won't be affected

return result


Confidence scores in Token Classification (NER)
To store entity predictions you can attach a score using the last position of the entity tuple `(label, char_start, char_end, score)`. Let's see an example:

python
import rubrix as rb

text = "Rubrix is a data science tool"

record = rb.TokenClassificationRecord(
text=text,
tokens=text.split(" "),
prediction=[("PRODUCT", 0, 6, 0.99)]
)

rb.log(record, "ner_with_scores")

Then, in the web application, you and your team can use the score filter to find potentially problematic entities, like in the screenshot below:

<img width="1587" alt="Screenshot 2022-05-12 at 11 49 43" src="https://user-images.githubusercontent.com/1107111/168043415-ea52354d-24aa-407f-a34e-474f30b55883.png">

If you want to see this in action, check this blog post by David Berenstein:

https://www.rubrix.ml/blog/concise-concepts-rubrix/

Rule metrics sidebar
We have a fresh new sidebar for the weak labeling mode, where you can see your overall rule metrics as you define new rules.

This sidebar should help you quickly understand your progress:

<img width="1572" alt="Screenshot 2022-05-12 at 11 52 10" src="https://user-images.githubusercontent.com/1107111/168043851-abcfa5d4-44c0-4c1b-b2e0-ac2146a76875.png">

See the updated user guide here: https://rubrix.readthedocs.io/en/v0.14.0/reference/webapp/define_rules.html

Features

* **1132:** introduce async/background version of rb.log ([1391](https://github.com/recognai/rubrix/issues/1391)) ([900307e](https://github.com/recognai/rubrix/commit/900307e8fd73427753676499ba8821643bcec252)), closes [#1132](https://github.com/recognai/rubrix/issues/1132)
* **1247:** label models predict method returns DatasetForTextClassification ([1442](https://github.com/recognai/rubrix/issues/1442)) ([42ca1be](https://github.com/recognai/rubrix/commit/42ca1be3b109ba248fe85a761170075e4622dad2)), closes [#1247](https://github.com/recognai/rubrix/issues/1247)
* **1379:** show prediction score in NER ([1389](https://github.com/recognai/rubrix/issues/1389)) ([0bdccd2](https://github.com/recognai/rubrix/commit/0bdccd2c07e354b33f693453db3e767ec2978a53)), closes [#1379](https://github.com/recognai/rubrix/issues/1379) [#1451](https://github.com/recognai/rubrix/issues/1451)
* **961:** rules metrics in sidebar ([1377](https://github.com/recognai/rubrix/issues/1377)) ([261f53a](https://github.com/recognai/rubrix/commit/261f53adb35a95666a8a3978f881b9045ff77e09)), closes [#961](https://github.com/recognai/rubrix/issues/961) [#1408](https://github.com/recognai/rubrix/issues/1408)
* **home:** improve table actions and styles ([1384](https://github.com/recognai/rubrix/issues/1384)) ([f09746e](https://github.com/recognai/rubrix/commit/f09746e7cb30ef7250575589dcf316fe6b01efb1)), closes [#1355](https://github.com/recognai/rubrix/issues/1355) [#1333](https://github.com/recognai/rubrix/issues/1333)


Bug Fixes

* **1407:** fix visualization in 1024px viewport ([1420](https://github.com/recognai/rubrix/issues/1420)) ([46f8d4d](https://github.com/recognai/rubrix/commit/46f8d4d33149dcfefeb175e7e798ec2acca47dd6)), closes [#1441](https://github.com/recognai/rubrix/issues/1441)
* **1458:** token classifier visualization in Safari ([1459](https://github.com/recognai/rubrix/issues/1459)) ([01cc492](https://github.com/recognai/rubrix/commit/01cc49236d3a517f1743b1654824940617810518)), closes [#1458](https://github.com/recognai/rubrix/issues/1458)

0.13.3

[0.13.3](https://github.com/recognai/rubrix/compare/v0.13.2...v0.13.3) (2022-04-27)


Bug Fixes

* **1248:** allow multiple label attributions in UI ([1424](https://github.com/recognai/rubrix/issues/1424)) ([a9f8363](https://github.com/recognai/rubrix/commit/a9f8363b869df138cd26df11dcb42bb59d282aee)), closes [#1248](https://github.com/recognai/rubrix/issues/1248)
* **1409:** filtering by metadata with value list ([1415](https://github.com/recognai/rubrix/issues/1415)) ([7aca061](https://github.com/recognai/rubrix/commit/7aca06148c55dbe5e2af7fd793fadfcf7c9a6aab)), closes [#1409](https://github.com/recognai/rubrix/issues/1409)
* **1410:** apply dataset name pattern to user name ([1411](https://github.com/recognai/rubrix/issues/1411)) ([2087c21](https://github.com/recognai/rubrix/commit/2087c219c86a0de4fc1aca34e9a4ee658588eae3)), closes [#1410](https://github.com/recognai/rubrix/issues/1410)
* **1428:** support cleanlab v2 ([1436](https://github.com/recognai/rubrix/issues/1436)) ([d189ddb](https://github.com/recognai/rubrix/commit/d189ddbdd91a3f0396300832996992f0d4f74229)), closes [#1428](https://github.com/recognai/rubrix/issues/1428)
* **TokenClassification:** display characters between tokens words ([1418](https://github.com/recognai/rubrix/issues/1418)) ([a08cd7b](https://github.com/recognai/rubrix/commit/a08cd7b3c7ac0728a049b7b15c2e4fac5b997e19)), closes [#1414](https://github.com/recognai/rubrix/issues/1414) [#1383](https://github.com/recognai/rubrix/issues/1383)

0.13.2

[0.13.2](https://github.com/recognai/rubrix/compare/v0.13.1...v0.13.2) (2022-04-12)


Bug Fixes

* **1265:** persist pagination size after query ([1358](https://github.com/recognai/rubrix/issues/1358)) ([49ca243](https://github.com/recognai/rubrix/commit/49ca24344ba9382ada9172b6b681d8c797253e26)), closes [#1265](https://github.com/recognai/rubrix/issues/1265)
* **1367:** remove record text from metadata modal ([1385](https://github.com/recognai/rubrix/issues/1385)) ([1782724](https://github.com/recognai/rubrix/commit/17827247882c856ab81409099bce31cbb80038fc)), closes [#1367](https://github.com/recognai/rubrix/issues/1367)
* **1368:** long list of entities in Token Classifier ([1388](https://github.com/recognai/rubrix/issues/1388)) ([829269f](https://github.com/recognai/rubrix/commit/829269f48eb9b8ccb21fff5ea7c797686fa49001)), closes [#1368](https://github.com/recognai/rubrix/issues/1368) [#1393](https://github.com/recognai/rubrix/issues/1393)
* **1387:** improve metadata distinct values computation ([be9f68f](https://github.com/recognai/rubrix/commit/be9f68f5cdc581ef100646d025a8c1ec72970f81)), closes [#1387](https://github.com/recognai/rubrix/issues/1387)
* **install:** remove loguru dependency ([1372](https://github.com/recognai/rubrix/issues/1372)) ([9e52414](https://github.com/recognai/rubrix/commit/9e52414d7b4d3e901d9659373615d4bbd4560293)), closes [#1331](https://github.com/recognai/rubrix/issues/1331) [#1305](https://github.com/recognai/rubrix/issues/1305)
* **search:** compute dataset schema properly for advanced query dsl ([1380](https://github.com/recognai/rubrix/issues/1380)) ([f71ab91](https://github.com/recognai/rubrix/commit/f71ab913827835e1d385574eb4882bb82bf42e30))
* **visualization:** force break word in selectors ([1406](https://github.com/recognai/rubrix/issues/1406)) ([5ac1950](https://github.com/recognai/rubrix/commit/5ac1950788ecdcac4938ad017c4614fab655ab2a))

Page 2 of 5

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.