Cleanlab

Latest version: v2.7.1

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 4

2.3.0

We have added new functionality for active learning and easily making Keras models compatible with sklearn. Label issues can now be estimated 10x faster and with much less memory using new methods added to help users with massive datasets. This release is non-breaking when upgrading from v2.2.0 (except for certain methods in `cleanlab.experimental` that have been moved).

Active Learning with ActiveLab

For settings where you want to label more data to get better ML, active learning helps you train the best ML model with the least data labeling. Unfortunately data annotators often give imperfect labels, in which case we might sometimes prefer to have another annotator check an already-labeled example rather than labeling an entirely new example. [ActiveLab](https://cleanlab.ai/blog/active-learning/) is a new algorithm invented by our team that automatically answers the question: **which new data should I label or which of my current labels should be checked again?** ActiveLab is highly practical — it runs quickly and works with: any type of ML model, batch settings where many examples are (re)labeled before model retraining, and settings where multiple annotators can label an example (or just one annotator).

Here's all the code needed to determine active learning scores for examples in your unlabeled pool (no annotations yet) and labeled pool (at least one annotation already collected).


from cleanlab.multiannotator import get_active_learning_scores

scores_labeled_pool, scores_unlabeled_pool = get_active_learning_scores(
multiannotator_labels, pred_probs, pred_probs_unlabeled
)


The batch of examples with the lowest scores are those that are most informative to collect an additional label for (scores between labeled vs unlabeled pool are directly comparable). You can either have a new annotator label the batch of examples with lowest scores, or distribute them amongst your previous annotators as is most convenient. ActiveLab is also effective for: standard active learning where you collect at most one label per example (no re-labeling), as well as *active label cleaning* (with no unlabeled pool) where you only want to re-label examples to ensure 100% correct consensus labels (with the least amount of re-labeling).

Get started running ActiveLab with our [tutorial notebook](https://github.com/cleanlab/examples/blob/master/active_learning_multiannotator/active_learning.ipynb) from our repo that has many other [examples](https://github.com/cleanlab/examples/).

KerasWrapper

We've introduced [one-line wrappers](https://docs.cleanlab.ai/master/cleanlab/models/keras.html) for TensorFlow/Keras models that enable you to use TensorFlow models within scikit-learn workflows with features like `Pipeline`, `GridSearch` and more. Just change one line of code to make your existing Tensorflow/Keras model compatible with scikit-learn’s rich ecosystem! All you have to do is swap out: `keras.Model` → `KerasWrapperModel`, or `keras.Sequential` → `KerasSequentialWrapper`. Imported from `cleanlab.models.keras`, the wrapper objects have all the same methods of their keras counterparts, plus you can use them with tons of handy scikit-learn methods.

Resources to get started include:
- [Blogpost](https://cleanlab.ai/blog/transformer-sklearn/) and [Jupyter notebook](https://github.com/cleanlab/examples/blob/master/transformer_sklearn/transformer_sklearn.ipynb) demonstrating how to make a HuggingFace Transformer (BERT model) sklearn-compatible.
- [Jupyter notebook](https://github.com/cleanlab/examples/blob/master/huggingface_keras_imdb/huggingface_keras_imdb.ipynb) showing how to fit these sklearn-compatible models to a Tensorflow Dataset.
- [Revamped tutorial](https://docs.cleanlab.ai/master/tutorials/text.html) on label errors in text classification data, which has been updated to use this new wrapper.

Computational improvements for detecting label issues

Through extensive optimization of our multiprocessing code (thanks to clu0), `find_label_issues` has been made ~10x faster on Linux machines that have many CPU cores.

For massive datasets, `find_label_issues` may require too much memory to run our your machine. We've added new methods in [cleanlab.experimental.label_issues_batched](https://docs.cleanlab.ai/master/cleanlab/experimental/label_issues_batched.html) that can compute label issues with far less memory via mini-batch estimation. You can use these with billion-scale memmap arrays or Zarr arrays like this:

from cleanlab.experimental.label_issues_batched import find_label_issues_batched

labels = zarr.convenience.open("LABELS.zarr", mode="r")
pred_probs = zarr.convenience.open("PREDPROBS.zarr", mode="r")
issues = find_label_issues_batched(labels=labels, pred_probs=pred_probs, batch_size=100000)

By choosing sufficiently small `batch_size`, you should be able to handle pretty much any dataset (set it as large as your memory will allow for best efficiency). With default arguments, the batched methods closely approximate the results of the option: `cleanlab.filter.find_label_issues(..., filter_by="low_self_confidence", return_indices_ranked_by="self_confidence")`
This and `filter_by="low_normalized_margin"` are new `find_label_issues()` options added in v2.3, which require less computation and still output accurate estimates of the label errors.


Other changes to be aware of

- Like all major ML frameworks, we have dropped support for Python 3.6.
- We have moved some particularly useful models (fasttext, keras) from `cleanlab.experimental` -> `cleanlab.models`.

Change Log

* Shorten tutorial titles in docs for readability by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/553
* Swap CI workflow to actions by huiwengoh in https://github.com/cleanlab/cleanlab/pull/560
* Remove .pylintrc by elisno in https://github.com/cleanlab/cleanlab/pull/564
* Tutorial fixes by huiwengoh in https://github.com/cleanlab/cleanlab/pull/565
* Fix typo in CONTRIBUTING.md by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/566
* Multiannotator Active Learning Support by huiwengoh in https://github.com/cleanlab/cleanlab/pull/538
* multiannotator explanation improvements by jwmueller in https://github.com/cleanlab/cleanlab/pull/570
* Specify Sphinx to order functions by source code order by huiwengoh in https://github.com/cleanlab/cleanlab/pull/571
* Fix example in ema docstring by elisno in https://github.com/cleanlab/cleanlab/pull/563, https://github.com/cleanlab/cleanlab/pull/573
* update paper list and applications beyond label error detection in readme by jwmueller in https://github.com/cleanlab/cleanlab/pull/574, https://github.com/cleanlab/cleanlab/pull/580
* Drop Python 3.6 support (by jwmueller in https://github.com/cleanlab/cleanlab/pull/558, https://github.com/cleanlab/cleanlab/pull/577; by anishathalye in https://github.com/cleanlab/cleanlab/pull/562; by krmayankb in https://github.com/cleanlab/cleanlab/pull/578; by sanjanag in https://github.com/cleanlab/cleanlab/pull/579)
* add maximum line length by cgnorthcutt in https://github.com/cleanlab/cleanlab/pull/583
* Update github actions by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/589
* Revamp text tutorial by huiwengoh in https://github.com/cleanlab/cleanlab/pull/584
* clarify thresholding in issues_from_scores by jwmueller in https://github.com/cleanlab/cleanlab/pull/582
* Remove temp scaling from single annotator case by huiwengoh in https://github.com/cleanlab/cleanlab/pull/590
* Update docs dependencies by huiwengoh in https://github.com/cleanlab/cleanlab/pull/593
* Use euclidean distance for identifying outliers for lower dimensional features by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/581
* changing copyright year 2017-2022 to 2017-2023 by aditya1503 in https://github.com/cleanlab/cleanlab/pull/594
* Handle missing type parameters for generic type "ndarray" by elisno in https://github.com/cleanlab/cleanlab/pull/587
* Remove temp scaling for single-label case in ensemble method by huiwengoh in https://github.com/cleanlab/cleanlab/pull/597
* Adding type hints for mypy strict compatibility by unna97 in https://github.com/cleanlab/cleanlab/pull/585
* fix typo in outliers.ipynb by eltociear in https://github.com/cleanlab/cleanlab/pull/603
* 10x speedup in find_label_issues on linux via better multiprocessing by clu0 in https://github.com/cleanlab/cleanlab/pull/596
* Update tabular tutorial with better language by cmauck10 in https://github.com/cleanlab/cleanlab/pull/609
* Improve num_label_issues() to reflect most accurate num issues by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/610
* Removed duplicate classifier from setup.py by sanjanag in https://github.com/cleanlab/cleanlab/pull/612
* Add two methods to filter.find_label_issues by cgnorthcutt in https://github.com/cleanlab/cleanlab/pull/595
* Fix dictionary type annotation for OutOfDistribution object by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/616
* Fix format compatibility with latest black==23. release by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/620
* Create new cleanlab.models module by huiwengoh in https://github.com/cleanlab/cleanlab/pull/601
* upgrade torch in docs by jwmueller in https://github.com/cleanlab/cleanlab/pull/607
* fix bug: confidences -> confidence by jwmueller in https://github.com/cleanlab/cleanlab/pull/623
* Fixed duplicate issue removal in find_label_issues by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/624
* Method to estimate label issues with limited memory via mini-batches by jwmueller in https://github.com/cleanlab/cleanlab/pull/615, https://github.com/cleanlab/cleanlab/pull/629, https://github.com/cleanlab/cleanlab/pull/632, https://github.com/cleanlab/cleanlab/pull/635
* Fix KerasWrapper summary method by huiwengoh in https://github.com/cleanlab/cleanlab/pull/631
* Clarify rank.py not for multi-label classification by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/626
* Removed $ from shell commands to avoid it being copied by sanjanag in https://github.com/cleanlab/cleanlab/pull/625
* label_issues_batched multiprocessing by clu0 in https://github.com/cleanlab/cleanlab/pull/630, https://github.com/cleanlab/cleanlab/pull/634
* Switch to typing.Self by anishathalye in https://github.com/cleanlab/cleanlab/pull/489
* Documentation improvements by huiwengoh in https://github.com/cleanlab/cleanlab/pull/643
* add 2.3.0 to release versions by jwmueller in https://github.com/cleanlab/cleanlab/pull/644

New Contributors
* krmayankb made their first contribution in https://github.com/cleanlab/cleanlab/pull/578
* sanjanag made their first contribution in https://github.com/cleanlab/cleanlab/pull/579
* unna97 made their first contribution in https://github.com/cleanlab/cleanlab/pull/585
* eltociear made their first contribution in https://github.com/cleanlab/cleanlab/pull/603
* clu0 made their first contribution in https://github.com/cleanlab/cleanlab/pull/596

**Full Changelog**: https://github.com/cleanlab/cleanlab/compare/v2.2.0...v2.3.0

2.2

Finding label issues in multi-label classification is done using the same code and inputs as before (and the same object is returned as before):
python
from cleanlab.filter import find_label_issues

ranked_label_issues = find_label_issues(
labels=labels,
pred_probs=pred_probs,
multi_label=True,
return_indices_ranked_by="self_confidence",
)

Where for a 3-class multi-label dataset with 4 examples, we might have say:
python
labels = [[0], [0, 1], [0, 2], [1]]

pred_probs = np.array(
[[0.9, 0.1, 0.1],
[0.9, 0.1, 0.8],
[0.9, 0.1, 0.6],
[0.2, 0.8, 0.3]]
)



The following code (in which class 1 is missing from the dataset) did not previously work but now runs without problem in cleanlab v2.2.0:
python
from cleanlab.filter import find_label_issues
import numpy as np

labels = [0, 0, 2, 0, 2]
pred_probs = np.array(
[[0.8, 0.1, 0.1],
[0.7, 0.1, 0.2],
[0.3, 0.1, 0.6],
[0.5, 0.2, 0.3],
[0.1, 0.1, 0.8]]
)

label_issues = find_label_issues(
labels=labels,
pred_probs=pred_probs,
)


Looking forward

The next major release of this package will introduce a paradigm shift in the way people check their datasets. Today this involves significant manual labor, but software should be able to help! Our research has developed algorithms that can automatically detect many types of common issues that plague real-world ML datasets. The next version of cleanlab will offer an easy-to-use line of code that runs all of our appropriate algorithms to help ensure a given dataset is issue-free and well-suited for supervised learning.

Transforming cleanlab into the first universal data-centric AI platform is a major effort and we need your help! Many easy ways to contribute are listed [on our github](https://github.com/cleanlab/cleanlab/wiki#ideas-for-contributing-to-cleanlab) or you can jump into the discussions on [Slack](https://cleanlab.ai/slack).

Change Log
* updated label_quality_utils.py and rebuilt the doc by ethanotran in https://github.com/cleanlab/cleanlab/pull/475
* Add workflow for skipping notebooks by huiwengoh in https://github.com/cleanlab/cleanlab/pull/472
* Fix return type in token classification get_label_quality_scores by jwmueller in https://github.com/cleanlab/cleanlab/pull/477
* Adding pylint CI checks by mohitsaxenaknoldus in https://github.com/cleanlab/cleanlab/pull/465
* CI: Build check cleanlab works without optional dependencies by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/470
* Outlier tutorial: move uninteresting code to hidden cell by jwmueller in https://github.com/cleanlab/cleanlab/pull/492
* Update DEVELOPMENT.md with howto add new modules by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/494
* Minor asthetic fix for tutorials.ipynb by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/493
* Update __init__.py to include major files by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/490
* Make type checking pass with mypy 0.981 by anishathalye in https://github.com/cleanlab/cleanlab/pull/488
* Update issues returned by num_label_issues by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/485
* Mypy typechecking fix for count.py by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/500
* Add basic utilities for handling quality scores for multilabel data by elisno in https://github.com/cleanlab/cleanlab/pull/499
* reinvented algorithms for multilabel find_label_issues by aditya1503 in https://github.com/cleanlab/cleanlab/pull/483
* Trying to fix typings by ChinoCodeDemon in https://github.com/cleanlab/cleanlab/pull/502
* Add internal function to properly format labels by huiwengoh in https://github.com/cleanlab/cleanlab/pull/504
* Mention internal format label function in multiannotator docs by huiwengoh in https://github.com/cleanlab/cleanlab/pull/506
* Multilabel code restructuring with aggregation/scorer functions by aditya1503 in https://github.com/cleanlab/cleanlab/pull/509
* Separate word_coloring from token_replacement in color_sentence by elisno in https://github.com/cleanlab/cleanlab/pull/514
* Add support for missing classes by cgnorthcutt in https://github.com/cleanlab/cleanlab/pull/511
* Better missing class support for label quality scoring by jwmueller in https://github.com/cleanlab/cleanlab/pull/518
* moving multilabel functions by aditya1503 in https://github.com/cleanlab/cleanlab/pull/515
* restrict typecheck to python v3.10 by jwmueller in https://github.com/cleanlab/cleanlab/pull/521
* support missing classes in multiannotator functions by huiwengoh in https://github.com/cleanlab/cleanlab/pull/519
* fix mypy typing by huiwengoh in https://github.com/cleanlab/cleanlab/pull/524
* Add studio banner by cmauck10 in https://github.com/cleanlab/cleanlab/pull/525
* added missing classes test for multilabel by aditya1503 in https://github.com/cleanlab/cleanlab/pull/523
* Improve tutorials language/formatting by jwmueller in https://github.com/cleanlab/cleanlab/pull/526
* Validate forgetting factor in EMA by elisno in https://github.com/cleanlab/cleanlab/pull/527
* Remove strong worded requirement for out-of-sample pred probs by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/520
* Ensure type checks pass with new mypy v0.990 by jwmueller in https://github.com/cleanlab/cleanlab/pull/530
* replace pylint --> flake8 by ilnarkz in https://github.com/cleanlab/cleanlab/pull/531
* Tutorial for multi-label classification by aditya1503 in https://github.com/cleanlab/cleanlab/pull/517
* Fix multilabel_py dimensionality by elisno in https://github.com/cleanlab/cleanlab/pull/535
* cleanlab install on colab for multilabel tutorial by jwmueller in https://github.com/cleanlab/cleanlab/pull/537
* Refactor MultilabelScorer helper methods and tests by elisno in https://github.com/cleanlab/cleanlab/pull/540
* Make a public method for multilabel quality scores by jwmueller in https://github.com/cleanlab/cleanlab/pull/542
* Improve and standardize documentation in label error detection methods for classification datasets by jwmueller in https://github.com/cleanlab/cleanlab/pull/543
* Move mypy configuration to config file by anishathalye in https://github.com/cleanlab/cleanlab/pull/545
* Fix types to work with latest pandas-stubs by anishathalye in https://github.com/cleanlab/cleanlab/pull/546
* Fix passing of kwargs to get_label_quality_scores by anishathalye in https://github.com/cleanlab/cleanlab/pull/547
* Switch CI cron schedule by anishathalye in https://github.com/cleanlab/cleanlab/pull/548
* Remove unnecessary type: ignore annotations by anishathalye in https://github.com/cleanlab/cleanlab/pull/549
* update readme for v2.2 by jwmueller in https://github.com/cleanlab/cleanlab/pull/551

New Contributors
* ethanotran made their first contribution in https://github.com/cleanlab/cleanlab/pull/475
* mohitsaxenaknoldus made their first contribution in https://github.com/cleanlab/cleanlab/pull/465
* aditya1503 made their first contribution in https://github.com/cleanlab/cleanlab/pull/483
* ChinoCodeDemon made their first contribution in https://github.com/cleanlab/cleanlab/pull/502
* cmauck10 made their first contribution in https://github.com/cleanlab/cleanlab/pull/525
* ilnarkz made their first contribution in https://github.com/cleanlab/cleanlab/pull/531
* Po-He Tseng helped run some early tests of our new multi-label algorithms

**Full Changelog**: https://github.com/cleanlab/cleanlab/compare/v2.1.0...v2.2.0

2.2.0

Multi-label support for applications like image/document/text tagging

The newest version of cleanlab features a complete overhaul of cleanlab’s multi-label classification functionality:

- We invented new algorithms for detecting label errors in multi-label datasets that are significantly more effective. These methods are formally described and extensively benchmarked in our [research paper](https://arxiv.org/abs/2211.13895).
- We added `cleanlab.multilabel_classification` module for label quality scoring.
- We now offer an easy-to-follow [quickstart tutorial](https://docs.cleanlab.ai/stable/tutorials/) for learning how to apply cleanlab to multi-label datasets.
- We’ve created [example notebooks](https://github.com/cleanlab/examples/tree/master/multilabel_classification) on using cleanlab to clean up image tagging datasets, and how to train a state-of-the-art Pytorch neural network for multi-label classification with any image dataset.
- All of this multi-label functionality is now robustly tested via a comprehensive suite of unit tests to ensure it remains performant.

cleanlab now works when your labels have some classes missing relative to your predicted probabilities

The package now works for datasets in which some classes happen to not be present (but are present say in the `pred_probs` output by a model). This is useful when you:

- Want to use a pretrained model that was fit with additional classes
- Have rare classes and happen to split the data in an unlucky way
- Are doing active learning or other dynamic modeling with data that are iteratively changing
- Are analyzing multi-annotator datasets with `cleanlab.multiannotator` and some annotators occasionally select a really rare class.

Other major improvements

(in addition to too many bugfixes to name):

- Accuracy improvements to the algorithm used to estimate the number of label errors in a dataset via `count.num_label_issues()`. — ulya-tkch
- Introduction of flake8 code linter to ensure the highest standards for our code. — ilnarkz, mohitsaxenaknoldus
- More comprehensive mypy type annotations for cleanlab functions to make our code safer and more understandable. — elisno, ChinoCodeDemon, anishathalye, jwmueller, huiwengoh, ulya-tkch

Special thanks to Po-He Tseng for helping with early tests of our improved multi-label algorithms and the research behind developing them.

2.1

Out of Distribution and Outlier Detection

1. Detect **out of distribution** examples in a dataset based on its numeric **feature embeddings**
python
from cleanlab.outlier import OutOfDistribution

ood = OutOfDistribution()

To get outlier scores for train_data using feature matrix train_feature_embeddings
ood_train_feature_scores = ood.fit_score(features=train_feature_embeddings)

To get outlier scores for additional test_data using feature matrix test_feature_embeddings
ood_test_feature_scores = ood.score(features=test_feature_embeddings)


2. Detect **out of distribution** examples in a dataset based on **predicted class probabilities** from a trained classifier
python
from cleanlab.outlier import OutOfDistribution

ood = OutOfDistribution()

To get outlier scores for train_data using predicted class probabilities (from a trained classifier) and given class labels
ood_train_predictions_scores = ood.fit_score(pred_probs=train_pred_probs, labels=labels)

To get outlier scores for additional test_data using predicted class probabilities
ood_test_predictions_scores = ood.score(pred_probs=test_pred_probs)


Multi-annotator -- support data with multiple labels

3. For data **labeled by multiple annotators** (stored as matrix `multiannotator_labels` whose rows correspond to examples, columns to each annotator’s chosen labels), cleanlab v2.1 can: find improved consensus labels, score their quality, and assess annotators, all by leveraging predicted class probabilities `pred_probs` from *any* trained classifier
python
from cleanlab.multiannotator import get_label_quality_multiannotator

get_label_quality_multiannotator(multiannotator_labels, pred_probs)


Support Token Classification tasks

4. Cleanlab v2.1 can now find label issues in **token classification** (text) data, where each word in a sentence is labeled with one of K classes (eg. entity recognition). This relies on three inputs:

- `tokens`: List of tokenized sentences whose `i`th element is a list of strings corresponding to tokens of the `i`th sentence in dataset.
Example: `[..., ["I", "love", "cleanlab"], ...]`
- `labels`: List whose `i`th element is a list of integers corresponding to class labels of each token in the `i`th sentence. Example: `[..., [0, 0, 1], ...]`
- `pred_probs`: List whose `i`th element is a np.ndarray of shape `(N_i, K)` corresponding to predicted class probabilities for each token in the `i`th sentence (assuming this sentence contains `N_i` tokens and dataset has `K` possible classes). These should be out-of-sample `pred_probs` obtained from a token classification model via cross-validation.
Example: `[..., np.array([[0.8,0.2], [0.9,0.1], [0.3,0.7]]), ...]`

Using these, you can easily find and display mislabeled tokens in your data
python
from cleanlab.token_classification.filter import find_label_issues
from cleanlab.token_classification.summary import display_issues

issues = find_label_issues(labels, pred_probs)
display_issues(issues, tokens, pred_probs=pred_probs, given_labels=labels,
class_names=optional_list_of_ordered_class_names)


Support pd.DataFrames, Keras/PyTorch/TF Datasets, Keras models, etc.

5. `CleanLearning` can now operate directly on **non-array dataset** formats like tensorflow/pytorch `Datasets` and use **arbitrary Keras models**:

python
import numpy as np
import tensorflow as tf
from cleanlab.experimental.keras import KerasWrapperModel

dataset = tf.data.Dataset.from_tensor_slices((features_np_array, labels_np_array)) example tensorflow dataset created from numpy arrays
dataset = dataset.shuffle(buffer_size=len(features_np_array)).batch(32)

def make_model(num_features, num_classes):
inputs = tf.keras.Input(shape=(num_features,))
outputs = tf.keras.layers.Dense(num_classes)(inputs)
return tf.keras.Model(inputs=inputs, outputs=outputs, name="my_keras_model")

model = KerasWrapperModel(make_model, model_kwargs={"num_features": features_np_array.shape[1], "num_classes": len(np.unique(labels_np_array))})
cl = CleanLearning(model)
cl.fit(dataset, labels_np_array) variant of model.fit() that is more robust to noisy labels
robust_predictions = cl.predict(dataset) equivalent to model.predict() after training on cleaner data



Change Log
* Fix edgecase divide-by-0 in entropy-score by jwmueller in https://github.com/cleanlab/cleanlab/pull/241
* Fix some typos. by Yulv-git in https://github.com/cleanlab/cleanlab/pull/242
* Updated project urls in setup.py by calebchiam in https://github.com/cleanlab/cleanlab/pull/249
* FeatureReq 33: Added custom sample_weight by rushic24 in https://github.com/cleanlab/cleanlab/pull/248
* Allow users to pass custom weights for ensemble label quality scoring by JohnsonKuan in https://github.com/cleanlab/cleanlab/pull/255
* Fix line index of CleanLearning(), some text of links, etc. by Yulv-git in https://github.com/cleanlab/cleanlab/pull/260
* Copy the docs build artifacts to the "stable" folder by weijinglok in https://github.com/cleanlab/cleanlab/pull/231
* Add Negative Log Loss Weighting Scheme for Ensemble Label Quality Score by JohnsonKuan in https://github.com/cleanlab/cleanlab/pull/267
* Developed class that allow the use of cleanlab with tensorflow and huggingface models by MattiaSangermano in https://github.com/cleanlab/cleanlab/pull/247
* Add KNN distance OOD scoring function and unit tests by JohnsonKuan in https://github.com/cleanlab/cleanlab/pull/268
* Dataset documentation clarifications by jwmueller in https://github.com/cleanlab/cleanlab/pull/270
* Add issue templates by anishathalye in https://github.com/cleanlab/cleanlab/pull/278
* Fix bug. get thresholds broken for multi_label by cgnorthcutt in https://github.com/cleanlab/cleanlab/pull/264
* Clarify labels format by cgnorthcutt in https://github.com/cleanlab/cleanlab/pull/282
* Drop dependency on SciPy by anishathalye in https://github.com/cleanlab/cleanlab/pull/286
* Make CleanLearning work with pandas and other non-numpy feature objects X by jwmueller in https://github.com/cleanlab/cleanlab/pull/285
* Allow CleanLearning to use validation data in each fold by huiwengoh in https://github.com/cleanlab/cleanlab/pull/295
* Created FAQ Page in the Cleanlab documentation by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/294
* Proper validation of `labels` values/format across package by jwmueller in https://github.com/cleanlab/cleanlab/pull/301
* Add static type checking by anishathalye in https://github.com/cleanlab/cleanlab/pull/306
* error for missing classes, consistency on determining num_classes by jwmueller in https://github.com/cleanlab/cleanlab/pull/308
* Added support to build KNN graph for OOD detection with only training data by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/305
* Standardize naming on K, num_classes and N, num_examples by huiwengoh in https://github.com/cleanlab/cleanlab/pull/312
* Added outlier detection tutorial into docs by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/310
* Updating tutorials hyperlink to 2.0.0 release by aravindputrevu in https://github.com/cleanlab/cleanlab/pull/318
* Allow KNN object to be returned by get_outlier_scores, Improved OOD tutorial by jwmueller in https://github.com/cleanlab/cleanlab/pull/319
* Some FAQ tips on how to improve CleanLearning by jwmueller in https://github.com/cleanlab/cleanlab/pull/324
* Updated tutorials to include quickstart by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/323
* Add y argument as alternative to labels in CleanLearning.fit() by elisno in https://github.com/cleanlab/cleanlab/pull/322
* validation.py: Annotate function args and return values by elisno in https://github.com/cleanlab/cleanlab/pull/317
* Fixed package version issues for audio tutorial by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/325
* Add compatibility for tensorflow and pytorch Dataset objects by jwmueller in https://github.com/cleanlab/cleanlab/pull/311
* Re-order find_label_issues args for better clarity by jwmueller in https://github.com/cleanlab/cleanlab/pull/329
* Comment on missing/rare classes in FAQ by jwmueller in https://github.com/cleanlab/cleanlab/pull/332
* update sphinx to v5 by jwmueller in https://github.com/cleanlab/cleanlab/pull/327
* Allow missing classes in get_label_quality_scores by huiwengoh in https://github.com/cleanlab/cleanlab/pull/334
* Allow missing classes in assert_valid_class_labels by huiwengoh in https://github.com/cleanlab/cleanlab/pull/335
* Changed all docstring instances of np.array to np.ndarray by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/336
* Update Contributing.md with Projects link and getting started instructions by jwmueller in https://github.com/cleanlab/cleanlab/pull/349
* Switch docs links from latest release to stable by elisno in https://github.com/cleanlab/cleanlab/pull/379
* Extending cleanlab to find label errors in token classification datasets by ericwang1997 in https://github.com/cleanlab/cleanlab/pull/347
* Cleanlab functionality for multiannotator data by huiwengoh in https://github.com/cleanlab/cleanlab/pull/333
* Cleanup token classification code by elisno in https://github.com/cleanlab/cleanlab/pull/390
* Fix typing for find_label_issues by elisno in https://github.com/cleanlab/cleanlab/pull/391
* Match token/s in color_sentence by elisno in https://github.com/cleanlab/cleanlab/pull/397
* Escape special regex characters by elisno in https://github.com/cleanlab/cleanlab/pull/404
* Add FAQ question on how to get predicted labels by jwmueller in https://github.com/cleanlab/cleanlab/pull/402
* Implementing get_ood_scores function by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/338
* Add termcolor dependency by huiwengoh in https://github.com/cleanlab/cleanlab/pull/415
* Add token classification tutorial notebook to docs.cleanlab.ai by elisno in https://github.com/cleanlab/cleanlab/pull/411
* Update examples links by huiwengoh in https://github.com/cleanlab/cleanlab/pull/421
* Polish multiannotator docs by huiwengoh in https://github.com/cleanlab/cleanlab/pull/422
* Text tutorial improvements by jwmueller in https://github.com/cleanlab/cleanlab/pull/429
* suppress tensorflow warning logs in tutorials if not properly installed by jwmueller in https://github.com/cleanlab/cleanlab/pull/432
* Add autodoc-typehints extension for sphinx by elisno in https://github.com/cleanlab/cleanlab/pull/412
* Strip input prompts when copying code snippets by elisno in https://github.com/cleanlab/cleanlab/pull/439
* Extend KerasWrapper to Functional API by huiwengoh in https://github.com/cleanlab/cleanlab/pull/434
* Deploy documentation for token classification module by elisno in https://github.com/cleanlab/cleanlab/pull/438
* Updated labels to allow array_like by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/426
* Add keras wrapper to docs by jwmueller in https://github.com/cleanlab/cleanlab/pull/443
* Format all return docstrings and add typing by jwmueller in https://github.com/cleanlab/cleanlab/pull/437
* make num_label_issues = cj calibrated offdiag sum by cgnorthcutt in https://github.com/cleanlab/cleanlab/pull/445
* fix bug in hard-coded test. generalize the test by cgnorthcutt in https://github.com/cleanlab/cleanlab/pull/448
* Change output of display_issues by elisno in https://github.com/cleanlab/cleanlab/pull/450
* More improvements to token classification code and documentation by jwmueller in https://github.com/cleanlab/cleanlab/pull/452
* Fix details disclosure elements in docs by anishathalye in https://github.com/cleanlab/cleanlab/pull/456
* Add missing backticks and language annotation by anishathalye in https://github.com/cleanlab/cleanlab/pull/461
* Error handling for rare classes in multiannotator data by huiwengoh in https://github.com/cleanlab/cleanlab/pull/455
* Fix docs build in CI by anishathalye in https://github.com/cleanlab/cleanlab/pull/462
* Added support for returning ranked issue idxs by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/459
* update readme for v2.1 by jwmueller in https://github.com/cleanlab/cleanlab/pull/457
* Clearer code examples on docs main page by cgnorthcutt in https://github.com/cleanlab/cleanlab/pull/430

New Contributors
* Yulv-git made their first contribution in https://github.com/cleanlab/cleanlab/pull/242
* rushic24 made their first contribution in https://github.com/cleanlab/cleanlab/pull/248
* MattiaSangermano made their first contribution in https://github.com/cleanlab/cleanlab/pull/247
* ulya-tkch made their first contribution in https://github.com/cleanlab/cleanlab/pull/293
* huiwengoh made their first contribution in https://github.com/cleanlab/cleanlab/pull/295
* aravindputrevu made their first contribution in https://github.com/cleanlab/cleanlab/pull/318
* elisno made their first contribution in https://github.com/cleanlab/cleanlab/pull/322
* ericwang1997 made their first contribution in https://github.com/cleanlab/cleanlab/pull/340

**Full Changelog**: https://github.com/cleanlab/cleanlab/compare/v2.0.0...v2.1.0

2.1.0

Major new functionalities:

- **CROWDLAB algorithms** for analysis of data labeled by multiple annotators — huiwengoh, ulya-tkch, jwmueller
- Accurately infer the best consensus label for each example
- Estimate the quality of each consensus label (how likely is it correct)
- Estimate the overall quality of each annotator (how trustworthy are their suggested labels)
- **Out of Distribution Detection** based on either:
- feature values/embeddings — ulya-tkch, jwmueller, JohnsonKuan
- predicted class probabilities — ulya-tkch
- **Label error detection for Token Classification** tasks (NLP / text data) — ericwang1997, elisno
- **CleanLearning can now:**
- Run on non-array data types including: pandas Dataframe, pytorch/tensorflow Dataset objects, and many other types of data formats. — jwmueller
- Allow base model’s fit() to utilize validation data in each fold during cross-validation (eg. for early-stopping or hyperparameter-optimization purposes). — huiwengoh
- Train with custom sample weights for datapoints. — rushic24, jwmueller
- Utilize any Keras model (supporting both sequential or functional APIs) via cleanlab’s `KerasWrapperModel` , which makes these models compatible with sklearn and tensorflow Datasets. — huiwengoh, jwmueller

Major improvements (in addition to too many bugfixes to name):

- Reduced dependencies: `scipy` is no longer needed — anishathalye
- Clearer error/warning messages throughout package when data/inputs are strangely formatted — cgnorthcutt, jwmueller, huiwengoh
- FAQ section in tutorials with advice for commonly encountered issues — huiwengoh, ulya-tkch, jwmueller, cgnorthcutt
- Many additional tutorial and example notebooks at:
[docs.cleanlab.ai](http://docs.cleanlab.ai/) and [https://github.com/cleanlab/examples](https://github.com/cleanlab/examples) — ulya-tkch, huiwengoh, jwmueller, ericwang1997
- Static type annotations to ensure robust code — anishathalye, elisno

2.0.0

If you liked cleanlab v1.0.1, v2.0.0 will blow your mind! 💥🧠

cleanlab 2.0 adds powerful new workflows and algorithms for data-centric AI, dataset curation, auto-fixing label issues in data, learning with noisy labels, and more. Nearly every module, method, parameter, and docstring has been touched by this release.

If you're coming from 1.0, here's a [migration guide](https://docs.cleanlab.ai/master/migrating/migrate_v2.html).

A few highlights of new functionalities in cleanlab 2.0:

1. rank every data point by label quality
2. find label issues in any dataset.
3. train any classifier on any dataset with label issues.
4. find overlapping classes to merge and/or delete at the dataset-level
5. yield an overall dataset health

For an in-depth overview of what cleanlab 2.0 can do, check out [this tutorial](https://docs.cleanlab.ai/master/tutorials/indepth_overview.html).

To help you get started with 2.0, we've added:

* [extensive documentation](https://docs.cleanlab.ai/)
* [tutorials](https://docs.cleanlab.ai/master/tutorials/image.html)
* [updated examples](https://github.com/cleanlab/examples)
* [powerful new workflows](https://docs.cleanlab.ai/master/migrating/migrate_v2.html)
* [blogs](https://cleanlab.ai/blog/)

Change Log

This list is non-exhaustive! Assume every aspect of API has changed.

Module name changes or moves:
- `classification.LearningWithNoisyLabels` class --> `classification.CleanLearning` class
- `pruning.py` --> `filter.py`
- `latent_estimation.py` --> `count.py`
- `cifar_cnn.py` --> `experimental/cifar_cnn.py`
- `coteaching.py` --> `experimental/coteaching.py`
- `fasttext.py` --> `experimental/fasttext.py`
- `mnist_pytorch.py` --> `experimental/fmnist_pytorch.py`
- `noise_generation.py` --> `benchmarking/noise_generation.py`
- `util.py` --> `internal/util.py`
- `latent_algebra.py` --> `internal/latent_algebra.py`

Module Deletions:
- removed `polyplex.py`
- removed models/` --> (moved content to experimental/)

New module created:
- `rank.py`
- moved all ranking and ordering functions from `pruning.py`/`filter.py` to here
- `dataset.py`
- brand new module supporting methods for dealing with data-level issues
- `benchmarking.py`
- Future benchmarking modules go here. Moved `noise_generation.py` here.

Method name changes:
- `pruning.get_noise_indices()` --> `filter.find_label_issues()`
- `count.num_label_errors()` --> `count.num_label_issues()`

Methods added:
- `rank.py` adds
- two ranking functions to rank data based on label quality for entire dataset (not just examples with label issues)
- `get_self_confidence_for_each_label()`
- `get_normalized_margin_for_each_label()`
- `filter.py` adds
- two more methods added to `filter.find_label_issues()` (select method using the `filter_by` parameter)
- `confident_learning`, which has been shown to work very well and may become the default in the future, and
- `predicted_neq_given`, which is useful for benchmarking a simple baseline approach, but underperformant relative to the other filter_by methods)
- `classification.py` adds
- `ClearnLearning.get_label_issues()`
- for a canonical one-line of code use:`CleanLearning().fit(X, y).get_label_issues()`
- no need to compute predicted probabilities in advance
- `CleanLearning.find_label_issues()`
- returns a dataframe with label issues (instead of just a mask)


Naming conventions changed in method names, comments, parameters, etc.
- `s` -> `labels`
- `psx` -> `pred_probs`
- `label_errors` --> `label_issues`
- `noise_mask` --> `label_issues_mask`
- `label_errors_bool` --> `label_issues_mask`
- `prune_method` --> `filter_by`
- `prob_given_label` --> `self_confidence`
- `pruning` --> `filtering`

Parameter re-ordering:
- re-ordered (`labels`, `pred_probs`) parameters to be consistent (in that order) in all methods.
- re-ordered parameters (e.g. `frac_noise`) in filter.find_label_issues()

Parameter changes:
- in `order_label_issues()`
- param: `sorted_index_method` --> `rank_by`
- in `find_label_issues()`
- param: `sorted_index_method` --> `return_indices_ranked_by`
- param: `prune_method` --> `filter_by`

Global variables changed:
- `filter.py`
- Only require 1 example to be left in each class
- `MIN_NUM_PER_CLASS = 5` --> `MIN_NUM_PER_CLASS = 1`
- enables cleanlab to work for toy-sized datasets

Dependencies added
- pandas=1.0.0



Way-too-detailed Change Log
* convert readme to markdown for pypi release. by cgnorthcutt in https://github.com/cleanlab/cleanlab/pull/126
* Add EditorConfig by anishathalye in https://github.com/cleanlab/cleanlab/pull/129
* Major API change. Introducing Cleanlab 2.0 by cgnorthcutt in https://github.com/cleanlab/cleanlab/pull/128
* Standardize code style to Black by anishathalye in https://github.com/cleanlab/cleanlab/pull/107
* Redirect RTD site to docs.cleanlab.ai by weijinglok in https://github.com/cleanlab/cleanlab/pull/130
* Redirect RTD site to docs.cleanlab.ai: Part 2 by weijinglok in https://github.com/cleanlab/cleanlab/pull/132
* Add image classification tutorial and streamline docs CI/CD by weijinglok in https://github.com/cleanlab/cleanlab/pull/127
* remove redundant text by jwmueller in https://github.com/cleanlab/cleanlab/pull/134
* Remove extra slashes in docs relative path by weijinglok in https://github.com/cleanlab/cleanlab/pull/135
* Fix docs TOC for v2.0 by weijinglok in https://github.com/cleanlab/cleanlab/pull/136
* Add label quality scoring functions and user API to choose the method by JohnsonKuan in https://github.com/cleanlab/cleanlab/pull/131
* Change cleanlab version in Image Tutorial by weijinglok in https://github.com/cleanlab/cleanlab/pull/138
* Utilites -> internal submodule refactor by jwmueller in https://github.com/cleanlab/cleanlab/pull/141
* Fix NumPy deprecation warning by anishathalye in https://github.com/cleanlab/cleanlab/pull/142
* Remove unnecessary print statement by anishathalye in https://github.com/cleanlab/cleanlab/pull/145
* Add explanation that estimators must be clonable by anishathalye in https://github.com/cleanlab/cleanlab/pull/146
* Update doc site quickstart page to reflect v2.0 API by weijinglok in https://github.com/cleanlab/cleanlab/pull/143
* Fix sklearn estimator cloning by anishathalye in https://github.com/cleanlab/cleanlab/pull/144
* Update default label quality scoring method to self_confidence by JohnsonKuan in https://github.com/cleanlab/cleanlab/pull/147
* Allow n-dim data in LearningWithNoisyLabels by anishathalye in https://github.com/cleanlab/cleanlab/pull/148
* Improve user-control by jwmueller in https://github.com/cleanlab/cleanlab/pull/149
* Enable use of find_label_issues_kwargs for hyper-parameter search by JohnsonKuan in https://github.com/cleanlab/cleanlab/pull/152
* Add fix and test for sklearn GridSearchCV with LearningWithNoisyLabels by JohnsonKuan in https://github.com/cleanlab/cleanlab/pull/153
* Add tutorial for tabular data classification by weijinglok in https://github.com/cleanlab/cleanlab/pull/151
* Minor tutorial edits by jwmueller in https://github.com/cleanlab/cleanlab/pull/155
* Add Python 3.10 to CI by anishathalye in https://github.com/cleanlab/cleanlab/pull/160
* Add development guide by anishathalye in https://github.com/cleanlab/cleanlab/pull/164
* Add text classification tutorial by weijinglok in https://github.com/cleanlab/cleanlab/pull/154
* Add audio tutorial to doc site by weijinglok in https://github.com/cleanlab/cleanlab/pull/165
* Add overview for computing out-of-sample predicted probabilities with cross-validation to doc site by weijinglok in https://github.com/cleanlab/cleanlab/pull/166
* Add CI check that .ipynb outputs are empty by anishathalye in https://github.com/cleanlab/cleanlab/pull/169
* Add CI check for trailing newlines in notebooks by anishathalye in https://github.com/cleanlab/cleanlab/pull/170
* Improve image tutorial accuracy and finding better label errors by weijinglok in https://github.com/cleanlab/cleanlab/pull/167
* Remove unnecessary version warning by anishathalye in https://github.com/cleanlab/cleanlab/pull/162
* Add test to check examples are found by cleanlab by weijinglok in https://github.com/cleanlab/cleanlab/pull/172
* Various tutorial improvements by jwmueller in https://github.com/cleanlab/cleanlab/pull/173
* Deploys docs only if triggered by master branch by weijinglok in https://github.com/cleanlab/cleanlab/pull/175
* added LearningWithNoisyLabels.find_label_issues instance method by jwmueller in https://github.com/cleanlab/cleanlab/pull/157
* Add note on EditorConfig to development guide by anishathalye in https://github.com/cleanlab/cleanlab/pull/176
* CleanLearning = Machine Learning with cleaned data by cgnorthcutt in https://github.com/cleanlab/cleanlab/pull/177
* Simple fix to Issue 158 (and potentially other issues) by cgnorthcutt in https://github.com/cleanlab/cleanlab/pull/178
* Update docs README by weijinglok in https://github.com/cleanlab/cleanlab/pull/180
* Polish the APIs and file-structure to prepare for 2.0 release by jwmueller in https://github.com/cleanlab/cleanlab/pull/181
* Make EditorConfig match Jupyter for notebooks by anishathalye in https://github.com/cleanlab/cleanlab/pull/179
* Add figure to out-of-sample pred proba via cv tutorial by weijinglok in https://github.com/cleanlab/cleanlab/pull/183
* Move rest of example_models/ -> experimental/ by jwmueller in https://github.com/cleanlab/cleanlab/pull/184
* Fix typo in test by anishathalye in https://github.com/cleanlab/cleanlab/pull/186
* Add more ergonomic method to skip notebooks by anishathalye in https://github.com/cleanlab/cleanlab/pull/185
* Add link checking for all Markdown files by anishathalye in https://github.com/cleanlab/cleanlab/pull/187
* Add link checking for compiled docs by anishathalye in https://github.com/cleanlab/cleanlab/pull/188
* Minor improvements to count docstring by cgnorthcutt in https://github.com/cleanlab/cleanlab/pull/190
* Add a GitHub icon and link below the docs' project title by weijinglok in https://github.com/cleanlab/cleanlab/pull/192
* Add self.labels as attribute of FastTextClassifier by JohnsonKuan in https://github.com/cleanlab/cleanlab/pull/194
* Move noise_generation into benchmarking module by anishathalye in https://github.com/cleanlab/cleanlab/pull/196
* Remove import of internal package by anishathalye in https://github.com/cleanlab/cleanlab/pull/195
* more specific filter-warning by jwmueller in https://github.com/cleanlab/cleanlab/pull/193
* Remove deprecated functions by anishathalye in https://github.com/cleanlab/cleanlab/pull/197
* Docs cleanup by anishathalye in https://github.com/cleanlab/cleanlab/pull/189
* Introducing the new Dataset Module for cleanlab 2.0 by cgnorthcutt in https://github.com/cleanlab/cleanlab/pull/182
* Readme reformat by jwmueller in https://github.com/cleanlab/cleanlab/pull/198
* Returns DataFrame type from CleanLearning functions by jwmueller in https://github.com/cleanlab/cleanlab/pull/199
* Migration guide for v2 by jwmueller in https://github.com/cleanlab/cleanlab/pull/200
* set verbose default false. fix order of printing by cgnorthcutt in https://github.com/cleanlab/cleanlab/pull/204
* Add from . import dataset to init by cgnorthcutt in https://github.com/cleanlab/cleanlab/pull/205
* Make minor doc tweaks by anishathalye in https://github.com/cleanlab/cleanlab/pull/203
* Fix error in docstring. missing item in tuple by cgnorthcutt in https://github.com/cleanlab/cleanlab/pull/206
* fix broken printing of matrices by cgnorthcutt in https://github.com/cleanlab/cleanlab/pull/207
* Add in-depth tutorial [WIP] by weijinglok in https://github.com/cleanlab/cleanlab/pull/208
* Revise migration guide by anishathalye in https://github.com/cleanlab/cleanlab/pull/209
* Tutorial header-levels decreased by jwmueller in https://github.com/cleanlab/cleanlab/pull/210
* Dark plots by jwmueller in https://github.com/cleanlab/cleanlab/pull/211
* bug fixes and jupyter notebook support added by cgnorthcutt in https://github.com/cleanlab/cleanlab/pull/212
* Add spaces at the end of doc side nav toc by weijinglok in https://github.com/cleanlab/cleanlab/pull/213
* Clarify and fix several docstrings. by cgnorthcutt in https://github.com/cleanlab/cleanlab/pull/214
* Configure display setting of Dataframe in 2.0 tutorial by JohnsonKuan in https://github.com/cleanlab/cleanlab/pull/215
* Formats dataset health tutorial by jwmueller in https://github.com/cleanlab/cleanlab/pull/216
* final tutorial edits. dataset docstrings imp by cgnorthcutt in https://github.com/cleanlab/cleanlab/pull/217
* Deploy docs when new release is tagged by weijinglok in https://github.com/cleanlab/cleanlab/pull/219
* Hardcode v1.0.1 hyperlink in in doc site by weijinglok in https://github.com/cleanlab/cleanlab/pull/221
* Make dataset tutorial runnable on website docs, improve pulldown formatting by jwmueller in https://github.com/cleanlab/cleanlab/pull/220
* Final docs polishing patches by jwmueller in https://github.com/cleanlab/cleanlab/pull/223
* Bump version to 2.0.0 by jwmueller in https://github.com/cleanlab/cleanlab/pull/222
* fix black formatting compliance by cgnorthcutt in https://github.com/cleanlab/cleanlab/pull/224
* Update readme links for v2.0 by jwmueller in https://github.com/cleanlab/cleanlab/pull/225
* Remove unneeded alt text by anishathalye in https://github.com/cleanlab/cleanlab/pull/228
* Add tutorial toc page by weijinglok in https://github.com/cleanlab/cleanlab/pull/230
* Proofreading README.md by calebchiam in https://github.com/cleanlab/cleanlab/pull/226
* Generalize text tutorial to multiclass datasets by calebchiam in https://github.com/cleanlab/cleanlab/pull/229
* Make small fixes by anishathalye in https://github.com/cleanlab/cleanlab/pull/235
* Address tutorial points of confusion by jwmueller in https://github.com/cleanlab/cleanlab/pull/233

New Contributors
* JohnsonKuan made their first contribution in https://github.com/cleanlab/cleanlab/pull/131
* calebchiam made their first contribution in https://github.com/cleanlab/cleanlab/pull/226

**Full Changelog**: https://github.com/cleanlab/cleanlab/compare/v1.0.1...v2.0.0

Page 3 of 4

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.