Cleanlab

Latest version: v2.7.1

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 2 of 4

2.6.2

Not secure

This release is non-breaking when upgrading from v2.6.1.

What's Changed
* Convert DataFrame features to numpy arrays in null value check by elisno in https://github.com/cleanlab/cleanlab/pull/1045

**Full Changelog**: https://github.com/cleanlab/cleanlab/compare/v2.6.1...v2.6.2

2.6.1

Not secure

This release is non-breaking when upgrading from v2.6.0. Some noteworthy updates include:

1. The label quality score in the `cleanlab.regression` module is improved to be more human-readable.
- This only involves rescaling the scores to display a more human-interpretable range of scores, without affecting how your data points are ranked within a dataset according to these scores.
2. Better address some edge-cases in `Datalab.get_issues()`.

What's Changed
* Readme updates by jwmueller in 1030, 1031, 1039; elisno in 1040
* Adjust the range of regression label quality scores by huiwengoh in https://github.com/cleanlab/cleanlab/pull/1032
* Misc fixes of get_issues method by elisno in 1025, 1026, 1028
* Support features as input for data valuation check in Datalab by elisno in https://github.com/cleanlab/cleanlab/pull/1023
* Fix/clarify docs by mturk24 in 1029; elisno in 1024, 1037
* CI/CD changes by elisno in https://github.com/cleanlab/cleanlab/pull/1036

New Contributors
* mturk24 made their first contribution in https://github.com/cleanlab/cleanlab/pull/1029

**Full Changelog**: https://github.com/cleanlab/cleanlab/compare/v2.6.0...v2.6.1

2.6.0

Not secure

This release is non-breaking when upgrading from v2.5.0, continuing our commitment to maintaining backward compatibility while introducing new features and improvements.
However, this release drops support for Python 3.7 while adding support for Python 3.11.

Enhancements to Datalab

In this update, Datalab, our dataset analysis platform, enhances its ability to identify various types of issues within your datasets. With this release, Datalab now detects additional types of issues by default, offering users a more comprehensive analysis. Specifically, it can now:

- Identify `null` values in your dataset.
- Detect `class_imbalance`.
- Highlight an `underperforming_group`, which refers to a subset of data points where your model exhibits poorer performance compared to others.
See our [FAQ](https://docs.cleanlab.ai/master/tutorials/faq.html#How-do-I-specify-pre-computed-data-slices/clusters-when-detecting-the-Underperforming-Group-Issue?)
for more information on how to provide pre-defined groups for this issue type.

Additionally, Datalab can now optionally:

- Assess the value of data points in your dataset using KNN-Shapley scores as a measure of `data_valuation`.

If you have ideas for new features or notice any bugs, we encourage you to open an Issue or Pull Request on our GitHub repository!

Expanded Datalab Support for New ML Tasks

With cleanlab v2.6.0, Datalab extends its support to new machine-learning tasks and introduces enhancements across the board.
This release introduces the `task` parameter in Datalab's API, enabling users to specify the type of machine learning task they are working on.

python
from cleanlab import Datalab

lab = Datalab(..., task="regression")

The `task`s currently supported are:

- **classification** (*default*): Includes all previously supported issue-checking capabilities based on `pred_probs`, `features`, or a `knn_graph`, and the new features introduced earlier.
- **regression** (*new*):
- Run specialized label error detection algorithms on regression datasets. You can see this in action in our updated [regression tutorial](https://docs.cleanlab.ai/master/tutorials/regression.html#5.-Other-ways-to-find-noisy-labels-in-regression-datasets).
- Find other issues utilizing `features` or a `knn_graph`.
- **multilabel** (*new*):
- Detect label errors in multilabel classification datasets using `pred_probs` exclusively. Explore the updated capabilities in our [multilabel tutorial](https://docs.cleanlab.ai/master/tutorials/multilabel_classification.html).
- Find various other types of issues based on `features` or a `knn_graph`.

Improved Object Detection Dataset Exploration

New functions have been introduced to enhance the exploration of object detection datasets, simplifying data comprehension and issue detection.
Learn how to leverage some of these functions in our [object detection tutorial](https://docs.cleanlab.ai/master/tutorials/object_detection.html#Exploratory-data-analysis).

Other Major Improvements

- Rescaled Near Duplicate and Outlier Scores:
- Note that what matters for all cleanlab issue scores is not their absolute magnitudes but rather how these scores rank the data points from most to least severe instances of the issue. But based on user feedback, we have updated the near duplicate and outlier scores to display a more human-interpretable range of values. How these scores rank data points within a dataset remains unchanged.
- Consistency in counting label issues:
- `cleanlab.dataset.health_summary()` now returns the same number of issues as `cleanlab.classification.find_label_issues()` and `cleanlab.count.num_label_issues()`.
- Improved handling of non-iid issues:
- The non-iid issue check in Datalab now handles `pred_probs` as input.
- Better reporting in Datalab:
- Simplified `Datalab.report()` now highlights only detected issue types. To view all checked issue types, use `Datalab.report(show_all_issues=True)`.
- Enhanced Handling of Binary Classification Tasks:
- Examples with predicted probabilities close to 0.5 for both classes are no longer flagged as label errors, improving the handling of binary classification tasks.
- Experimental Functionality:
- cleanlab now offers experimental functionality for detecting label issues in **span categorization** tasks with a single class, enhancing its applicability in natural language processing projects.

New Contributors

We're thrilled to welcome new contributors to the cleanlab community! Your contributions help us improve and grow cleanlab:

* smttsp made their first contribution in https://github.com/cleanlab/cleanlab/pull/867
* abhijitpal1247 made their first contribution in https://github.com/cleanlab/cleanlab/pull/856
* 01PrathamS made their first contribution in https://github.com/cleanlab/cleanlab/pull/893
* mglowacki100 made their first contribution in https://github.com/cleanlab/cleanlab/pull/796
* gibsonliketheguitar made their first contribution in https://github.com/cleanlab/cleanlab/pull/831
* kylegallatin made their first contribution in https://github.com/cleanlab/cleanlab/pull/885
* ryansingman made their first contribution in https://github.com/cleanlab/cleanlab/pull/919
* R-Peleg made their first contribution in https://github.com/cleanlab/cleanlab/pull/948

Thank you for your valuable contributions! If you're interested in contributing, check out our [contributing guide](https://github.com/cleanlab/cleanlab/blob/master/CONTRIBUTING.md) for ways to get involved.

Change Log

Significant changes in this release include:

* Update FAQ section in docs by tataganesh in https://github.com/cleanlab/cleanlab/pull/869; elisno in https://github.com/cleanlab/cleanlab/pull/913
* Improve Object Detection module by Steven-Yiran in https://github.com/cleanlab/cleanlab/pull/840, https://github.com/cleanlab/cleanlab/pull/877; aditya1503 in https://github.com/cleanlab/cleanlab/pull/883, https://github.com/cleanlab/cleanlab/pull/969, https://github.com/cleanlab/cleanlab/pull/968
* Clearer documentation/tutorials/readme by jwmueller in https://github.com/cleanlab/cleanlab/pull/851, https://github.com/cleanlab/cleanlab/pull/931, https://github.com/cleanlab/cleanlab/pull/981, https://github.com/cleanlab/cleanlab/pull/983, https://github.com/cleanlab/cleanlab/pull/1001, https://github.com/cleanlab/cleanlab/pull/978, https://github.com/cleanlab/cleanlab/pull/994, https://github.com/cleanlab/cleanlab/pull/1010; 01PrathamS in https://github.com/cleanlab/cleanlab/pull/893; elisno in https://github.com/cleanlab/cleanlab/pull/878, https://github.com/cleanlab/cleanlab/pull/1007, https://github.com/cleanlab/cleanlab/pull/992, https://github.com/cleanlab/cleanlab/pull/1015, https://github.com/cleanlab/cleanlab/pull/1016; huiwengoh in https://github.com/cleanlab/cleanlab/pull/984; sanjanag in https://github.com/cleanlab/cleanlab/pull/936; tataganesh in https://github.com/cleanlab/cleanlab/pull/916; ulya-tkch in https://github.com/cleanlab/cleanlab/pull/954;
* CI updates by aditya1503 in https://github.com/cleanlab/cleanlab/pull/864; elisno in https://github.com/cleanlab/cleanlab/pull/879, https://github.com/cleanlab/cleanlab/pull/961, https://github.com/cleanlab/cleanlab/pull/963, https://github.com/cleanlab/cleanlab/pull/965, https://github.com/cleanlab/cleanlab/pull/1008, https://github.com/cleanlab/cleanlab/pull/975, https://github.com/cleanlab/cleanlab/pull/1011, https://github.com/cleanlab/cleanlab/pull/1012, https://github.com/cleanlab/cleanlab/pull/1013, https://github.com/cleanlab/cleanlab/pull/1014; jwmueller in https://github.com/cleanlab/cleanlab/pull/852, https://github.com/cleanlab/cleanlab/pull/865; tataganesh in https://github.com/cleanlab/cleanlab/pull/900; anishathalye in https://github.com/cleanlab/cleanlab/pull/956; sanjanag in https://github.com/cleanlab/cleanlab/pull/1009
* Docs system updates by elisno in https://github.com/cleanlab/cleanlab/pull/880, https://github.com/cleanlab/cleanlab/pull/881, https://github.com/cleanlab/cleanlab/pull/958, https://github.com/cleanlab/cleanlab/pull/959, https://github.com/cleanlab/cleanlab/pull/960, https://github.com/cleanlab/cleanlab/pull/964
* Add Null Issue Manager by abhijitpal1247 in https://github.com/cleanlab/cleanlab/pull/856; tataganesh in https://github.com/cleanlab/cleanlab/pull/927, https://github.com/cleanlab/cleanlab/pull/917
* Add Data Valuation Issue Manager by coding-famer in https://github.com/cleanlab/cleanlab/pull/850, https://github.com/cleanlab/cleanlab/pull/925
* Extend non-iid issue check to run if only pred_probs are provided by abhijitpal1247 in https://github.com/cleanlab/cleanlab/pull/857; tataganesh in https://github.com/cleanlab/cleanlab/pull/896, https://github.com/cleanlab/cleanlab/pull/897
* Add Underperforming Group Issue Manager by tataganesh in https://github.com/cleanlab/cleanlab/pull/838, https://github.com/cleanlab/cleanlab/pull/907; elisno in https://github.com/cleanlab/cleanlab/pull/990
* Add Class Imbalance issue type to Datalab defaults by tataganesh in https://github.com/cleanlab/cleanlab/pull/912, https://github.com/cleanlab/cleanlab/pull/933; jwmueller in https://github.com/cleanlab/cleanlab/pull/924, https://github.com/cleanlab/cleanlab/pull/934; elisno in https://github.com/cleanlab/cleanlab/pull/940
* Add regression task to Datalab by mglowacki100 in https://github.com/cleanlab/cleanlab/pull/796; elisno in https://github.com/cleanlab/cleanlab/pull/902
* Add multilabel task to Datalab by tataganesh in https://github.com/cleanlab/cleanlab/pull/929
* 702 - Shorten Refs of classes and functions in Docs by gibsonliketheguitar in https://github.com/cleanlab/cleanlab/pull/831
* Update near duplicate issues and sets by ryansingman in https://github.com/cleanlab/cleanlab/pull/919; elisno in https://github.com/cleanlab/cleanlab/pull/895
* Rescale near duplicate scores by elisno in https://github.com/cleanlab/cleanlab/pull/943
* Rescale outlier scores by elisno in https://github.com/cleanlab/cleanlab/pull/953
* List comprehension to numpy ops for efficiency by tataganesh in https://github.com/cleanlab/cleanlab/pull/844
* Reduce memory usage of filter.find_label_issues() by kylegallatin in https://github.com/cleanlab/cleanlab/pull/885
* Updates to tests by aditya1503 in https://github.com/cleanlab/cleanlab/pull/945; elisno in https://github.com/cleanlab/cleanlab/pull/985, https://github.com/cleanlab/cleanlab/pull/998
* Refactor Datalab functionality by elisno in https://github.com/cleanlab/cleanlab/pull/971, https://github.com/cleanlab/cleanlab/pull/1006
* Minor fixes for Datalab by elisno in https://github.com/cleanlab/cleanlab/pull/997, https://github.com/cleanlab/cleanlab/pull/999, https://github.com/cleanlab/cleanlab/pull/1000, https://github.com/cleanlab/cleanlab/pull/1003, https://github.com/cleanlab/cleanlab/pull/1005, https://github.com/cleanlab/cleanlab/pull/979
* Drop Python 3.7 support and add Python 3.11 support by elisno in https://github.com/cleanlab/cleanlab/pull/980
* Add a `show_all_issues` optional argument to Datalab.report() by elisno in https://github.com/cleanlab/cleanlab/pull/970
* Single Class Span Classification Support by Steven-Yiran in https://github.com/cleanlab/cleanlab/pull/982
* ensure near-predicted labels are not flagged as label issues by aditya1503 in https://github.com/cleanlab/cleanlab/pull/950
* PR template added and gitignore improved by smttsp in https://github.com/cleanlab/cleanlab/pull/867
* Update label issue count in dataset.health_summary() by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/875
* Update segmentation.ipynb by R-Peleg in https://github.com/cleanlab/cleanlab/pull/948
* Refactor batching logic in cleanlab.segmentation.filter.find_label_issues by elisno in https://github.com/cleanlab/cleanlab/pull/918

For a full list of changes, enhancements, and fixes, please refer to the [Full Changelog](https://github.com/cleanlab/cleanlab/compare/v2.5.0...v2.6.0).

2.5.0

Not secure

This release is non-breaking when upgrading from v2.4.0 (except for certain methods in `cleanlab.experimental` that have been moved, especially utility methods related to Datalab).

New ML tasks supported

Cleanlab now supports all of the most common ML tasks! This newest release adds dedicated support for the following types of datasets:
- **regression** (finding errors in numeric data): see `cleanlab.regression` and the "noisy labels in regression" quickstart tutorial.
- **object detection**: see `cleanlab.object_detection` and the "Object Detection" quickstart tutorial.
- **image segmentation**: see `cleanlab.segmentation` and the "Semantic Segmentation tutorial.

Cleanlab previously already supported: multi-class classification, multi-label classification (image/document tagging), token classification (entity recognition, sequence prediction).

If there is another ML task you'd like to see this package support, please let us know (or even better open a Pull Request)!

Supporting these ML tasks properly required significant research and novel algorithms developed by our scientists. We have published papers on these for transparency and scientific rigor, check out the list in the README or learn more at:
https://cleanlab.ai/research/
https://cleanlab.ai/blog/

Improvements to Datalab

[Datalab](https://cleanlab.ai/blog/datalab/) is a general platform for detecting all sorts of common issues in real-world data, and the best place to get started for running this library on your datasets.

This release introduces major improvements and new functionalities in Datalab that include the ability to:

- Detect low-quality images in computer vision data (blurry, over/under-exposed, low-information, ...) via the integration of [CleanVision](https://cleanlab.ai/blog/cleanvision/).
- Detect label issues even without `pred_probs` from a ML model (you can instead just provide `features`).
- Flag rare classes in imbalanced classification datasets.
- Audit unlabeled datasets.

Other major improvements

- 50x speedup in the cleanlab.multiannotator code for analyzing data labeled by multiple annotators.
- Out-of-Distribution detection based on `pred_probs` via the [GEN algorithm](https://openaccess.thecvf.com/content/CVPR2023/papers/Liu_GEN_Pushing_the_Limits_of_Softmax-Based_Out-of-Distribution_Detection_CVPR_2023_paper.pdf) which is particularly effective for datasets with tons of classes.
- Many of the methods across the package to find label issues now support a `low_memory` option. When specified, it uses an approximate mini-batching algorithm that returns results much faster and requires much less RAM.

New Contributors

Transforming cleanlab into the first universal data-centric AI platform is a major effort and we need your help! Many easy ways to contribute are listed [on our github](https://github.com/cleanlab/cleanlab/wiki#ideas-for-contributing-to-cleanlab) or you can jump into the discussions on [Slack](https://cleanlab.ai/slack). We immensely appreciate all of the contributors who've helped build this package into what it is today, especially:

* gordon-lim made their first contribution in https://github.com/cleanlab/cleanlab/pull/746
* tataganesh made their first contribution in https://github.com/cleanlab/cleanlab/pull/751
* vdlad made their first contribution in https://github.com/cleanlab/cleanlab/pull/677
* axl1313 made their first contribution in https://github.com/cleanlab/cleanlab/pull/798
* coding-famer made their first contribution in https://github.com/cleanlab/cleanlab/pull/800

Change Log

* New feature: Label error detection in regression datasets by krmayankb in https://github.com/cleanlab/cleanlab/pull/572; by huiwengoh in https://github.com/cleanlab/cleanlab/pull/830

* New feature: ObjectLab for detecting mislabeled images in objection detection datasets by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/676, https://github.com/cleanlab/cleanlab/pull/739, https://github.com/cleanlab/cleanlab/pull/745, https://github.com/cleanlab/cleanlab/pull/770, https://github.com/cleanlab/cleanlab/pull/779, https://github.com/cleanlab/cleanlab/pull/807, https://github.com/cleanlab/cleanlab/pull/833; by aditya1503 in https://github.com/cleanlab/cleanlab/pull/750, https://github.com/cleanlab/cleanlab/pull/804

* New feature: Label error detection in segmentation datasets by vdlad in https://github.com/cleanlab/cleanlab/pull/677; by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/754, https://github.com/cleanlab/cleanlab/pull/756, https://github.com/cleanlab/cleanlab/pull/759, https://github.com/cleanlab/cleanlab/pull/772; by elisno in https://github.com/cleanlab/cleanlab/pull/775

* New feature: CleanVision to detect low-quality images by sanjanag in https://github.com/cleanlab/cleanlab/pull/679, https://github.com/cleanlab/cleanlab/pull/797

* New image quickstart tutorial that uses Datalab by sanjanag in https://github.com/cleanlab/cleanlab/pull/795

* Datalab code refactoring by elisno in https://github.com/cleanlab/cleanlab/pull/803, https://github.com/cleanlab/cleanlab/pull/783, https://github.com/cleanlab/cleanlab/pull/793, https://github.com/cleanlab/cleanlab/pull/729
* Make labels optional in Datalab by elisno in https://github.com/cleanlab/cleanlab/pull/730
* Update near-duplicate sets in Datalab by elisno in https://github.com/cleanlab/cleanlab/pull/781
* Include non-IID detection in set of default Datalab issue types by elisno in https://github.com/cleanlab/cleanlab/pull/723
* Extend Datalab to be able to detect label issues based on features by Steven-Yiran in https://github.com/cleanlab/cleanlab/pull/760
* Add imbalance issue type to Datalab by tataganesh in https://github.com/cleanlab/cleanlab/pull/758, https://github.com/cleanlab/cleanlab/pull/828
* Catch specific exception for knn in Datalab issue managers by tataganesh in https://github.com/cleanlab/cleanlab/pull/825
* Make plots smaller for datalab tutorials by tataganesh in https://github.com/cleanlab/cleanlab/pull/751

* 50x speedup and other improvements in multiannotator module by huiwengoh in https://github.com/cleanlab/cleanlab/pull/821, https://github.com/cleanlab/cleanlab/pull/784; by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/827

* ENH: make clipping unnecessary for entropy by DerWeh in https://github.com/cleanlab/cleanlab/pull/703

* Extend default CleanLearning classifier to work for more datasets by Steven-Yiran in https://github.com/cleanlab/cleanlab/pull/749
* CleanLearning code improvements by huiwengoh in https://github.com/cleanlab/cleanlab/pull/724; by jwmueller in https://github.com/cleanlab/cleanlab/pull/744
* Change CleanLearning inspect.getfullargspec to signature for sklearn v1.3 compatibility by huiwengoh in https://github.com/cleanlab/cleanlab/pull/761

* Expose low memory option for finding label issues by tataganesh in https://github.com/cleanlab/cleanlab/pull/791, https://github.com/cleanlab/cleanlab/pull/822

* Add GEN OOD-detection algorithm by coding-famer in https://github.com/cleanlab/cleanlab/pull/800

* Unify softmax implementations throughout package by elisno in https://github.com/cleanlab/cleanlab/pull/826

* Better warning handling for off_calibrated_custom in confident joint by gordon-lim in https://github.com/cleanlab/cleanlab/pull/746

* Clearer explanations in documentation/tutorials/readme by cgnorthcutt in https://github.com/cleanlab/cleanlab/pull/725; by jwmueller in https://github.com/cleanlab/cleanlab/pull/726, https://github.com/cleanlab/cleanlab/pull/734, https://github.com/cleanlab/cleanlab/pull/741, https://github.com/cleanlab/cleanlab/pull/743, https://github.com/cleanlab/cleanlab/pull/766, https://github.com/cleanlab/cleanlab/pull/832, https://github.com/cleanlab/cleanlab/pull/799, https://github.com/cleanlab/cleanlab/pull/752, https://github.com/cleanlab/cleanlab/pull/841, https://github.com/cleanlab/cleanlab/pull/816, https://github.com/cleanlab/cleanlab/pull/755, https://github.com/cleanlab/cleanlab/pull/731, https://github.com/cleanlab/cleanlab/pull/753, https://github.com/cleanlab/cleanlab/pull/845, https://github.com/cleanlab/cleanlab/pull/835, https://github.com/cleanlab/cleanlab/pull/847

* CI and documentation system updates by anishathalye in https://github.com/cleanlab/cleanlab/pull/742, https://github.com/cleanlab/cleanlab/pull/768, https://github.com/cleanlab/cleanlab/pull/769; by jwmueller in https://github.com/cleanlab/cleanlab/pull/837; by huiwengoh in https://github.com/cleanlab/cleanlab/pull/788, https://github.com/cleanlab/cleanlab/pull/757, https://github.com/cleanlab/cleanlab/pull/738, https://github.com/cleanlab/cleanlab/pull/794; by sanjanag in https://github.com/cleanlab/cleanlab/pull/843; by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/777; by elisno in https://github.com/cleanlab/cleanlab/pull/802; by axl1313 in https://github.com/cleanlab/cleanlab/pull/798

* Improved tests by huiwengoh in https://github.com/cleanlab/cleanlab/pull/778, https://github.com/cleanlab/cleanlab/pull/763

**Full Changelog**: https://github.com/cleanlab/cleanlab/compare/v2.4.0...v2.5.0

2.4.0

Not secure

Cleanlab has grown into a popular package used by thousands of data scientists to diagnose issues in diverse datasets and improve the data itself in order to fit more robust models. Many new methods/algorithms were added in recent months to increase the capabilities of this data-centric AI library.

Introducing Datalab

Now we've added a unified platform called `Datalab` for you to apply many of these capabilities in a single line of code!
To audit any classification dataset for issues, first use any trained ML model to produce `pred_probs` (predicted class probabilities) and/or `feature_embeddings` (numeric vector representations of each datapoint). Then, these few lines of code can detect many types of real-world issues in your dataset like label errors, outliers, near duplicates, etc:

python
from cleanlab import Datalab

lab = Datalab(data=dataset, label_name="column_name_for_labels")
lab.find_issues(features=feature_embeddings, pred_probs=pred_probs)
lab.report() summarize the issues found, how severe they are, and other useful info about the dataset

Follow our [blog](https://cleanlab.ai/blog/) to better understand how this works internally, many articles will be published there shortly!
A detailed description of each type of issue `Datalab` can detect is provided in [this guide](https://docs.cleanlab.ai/master/cleanlab/datalab/guide/issue_type_description.html), but we recommend first starting with the tutorials which show you how easy it is to run on your own dataset.

`Datalab` can be used to do things like find label issues with string class labels (whereas the prior `find_label_issues()` method required integer class indices). But you are still free to use all of the prior cleanlab methods you're used to! `Datalab` is also using these internally to detect data issues.

Our goal is for `Datalab` to be an easy way to run a comprehensive suite of cleanlab capabilities on any dataset. This is an evolving paradigm, so be aware some `Datalab` APIs may change in subsequent package versions -- as noted in the documentation.
You can easily run the issue checks in `Datalab` together with a custom issue type you define outside of cleanlab. This customizability also makes it easy to contribute new data quality algorithms into `Datalab`. Help us build the best open-source platform for data-centric AI by adding your ideas or those from recent publications! Feel free to reach out via [Slack](https://cleanlab.ai/slack).

Revamped Tutorials

We've updated some of our existing tutorials with more interesting datasets and ML models. Regarding the basic tutorials on identifying label issues in classification data from various modalities (image, text, audio, tables), we have also created an analogous versions to detect issues in these same datasets with `Datalab` instead (see `Datalab Tutorials`). This should help existing users quickly ramp up on using `Datalab` to see how much more powerful this comprehensive data audit can be.

Improvements for Multi-label Classification

To provide a better experience for users with multi-label classification datasets, we have explicitly separated the functionality to work with these into the `cleanlab.multilabel_classification` module. So please start there rather than specifying the `multi_label=True` flag in certain methods outside of this module, as that option will be deprecated in the future.

Particularly noteworthy are the new dataset-level issue summaries for multi-label classification datasets, available in the `cleanlab.multilabel_classification.dataset` module.

While moving methods to the `cleanlab.multilabel_classification` module, we noticed some bugs in existing methods. We got rid of these methods entirely (replacing them with new ones in the `cleanlab.multilabel_classification` module), so some changes may appear to be backwards incompatible, even though the original code didn't function as intended in the first place.

Backwards incompatible changes

Your existing code will break if you do not upgrade to the new versions of these methods (the existing cleanlab v.2.3.1 code was probably producing bad results anyway based on some bugs that have been fixed). Here are changes you must make in your code for it to work with newer cleanlab versions:

1) `cleanlab.dataset.rank_classes_by_label_quality(..., multi_label=True)`
→
`cleanlab.multilabel_classification.dataset.rank_classes_by_label_quality(...)`

The `multi_label=False/True` argument will be removed in the future from the former method.

2) `cleanlab.dataset.find_overlapping_classes(..., multi_label=True)`
→
`cleanlab.multilabel_classification.dataset.common_multilabel_issues(...)`

The `multi_label=False/True` argument will be removed in the future from the former method. The returned DataFrame is slightly different, please refer to the new method's documentation.

3) `cleanlab.dataset.overall_label_health_score(...multi_label=True)`
→
`cleanlab.multilabel_classification.dataset.overall_label_health_score(...)`

The `multi_label=False/True` argument will be removed in the future from the former method.

4) `cleanlab.dataset.health_summary(...multi_label=True)`
→
`cleanlab.multilabel_classification.dataset.multilabel_health_summary(...)`

The `multi_label=False/True` argument will be removed in the future from the former method.

There are no other backwards incompatible changes in the package with this release.

Deprecated workflows

We recommend updating your existing code to the new versions of these methods (existing cleanlab v2.3.1 code will still work though, for now). Here are changes we recommend:

1) `cleanlab.filter.find_label_issues(..., multi_label=True)`
→
`cleanlab.multilabel_classification.filter.find_label_issues(...)`

The `multi_label=False/True` argument will be removed in the future from the former method.

2) `from cleanlab.multilabel_classification import get_label_quality_scores`
→
`from cleanlab.multilabel_classification.rank import get_label_quality_scores`

**Remember**: *All* of the code to work with multi-label data now lives in the `cleanlab.multilabel_classification` module.

Change Log

* readme updates by jwmueller in https://github.com/cleanlab/cleanlab/pull/659, https://github.com/cleanlab/cleanlab/pull/660, https://github.com/cleanlab/cleanlab/pull/713
* CI updates (by sanjanag in https://github.com/cleanlab/cleanlab/pull/701; by huiwengoh in https://github.com/cleanlab/cleanlab/pull/671; by elisno in https://github.com/cleanlab/cleanlab/pull/695, https://github.com/cleanlab/cleanlab/pull/706)
* Documentation updates (by jwmueller in https://github.com/cleanlab/cleanlab/pull/669, https://github.com/cleanlab/cleanlab/pull/710, https://github.com/cleanlab/cleanlab/pull/711, https://github.com/cleanlab/cleanlab/pull/716, https://github.com/cleanlab/cleanlab/pull/719, https://github.com/cleanlab/cleanlab/pull/720; by huiwengoh in https://github.com/cleanlab/cleanlab/pull/714, https://github.com/cleanlab/cleanlab/pull/717; by elisno in https://github.com/cleanlab/cleanlab/pull/678, https://github.com/cleanlab/cleanlab/pull/684)
* Documentation: use default rules for shorter, more readable links by DerWeh in https://github.com/cleanlab/cleanlab/pull/700
* Added installation instructions for package extras by sanjanag in https://github.com/cleanlab/cleanlab/pull/697
* Pass confident joint computed in CleanLearning to filter.find_label_issues by huiwengoh in https://github.com/cleanlab/cleanlab/pull/661
* Add Example codeblock to the docstrings of important functions in the dataset module by Steven-Yiran in https://github.com/cleanlab/cleanlab/pull/662, https://github.com/cleanlab/cleanlab/pull/663, https://github.com/cleanlab/cleanlab/pull/668
* Remove batch size check in label_issues_batched by huiwengoh in https://github.com/cleanlab/cleanlab/pull/665
* adding multilabel dataset issue summaries by aditya1503 in https://github.com/cleanlab/cleanlab/pull/657
* move int2onehot, onehot2int to top of multilabel tutorial by jwmueller in https://github.com/cleanlab/cleanlab/pull/666
* Update softmax to more stable variant by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/667
* Revamp text and tabular tutorial by huiwengoh in https://github.com/cleanlab/cleanlab/pull/673, https://github.com/cleanlab/cleanlab/pull/693
* allow for kwargs in token find_label_issues by jwmueller in https://github.com/cleanlab/cleanlab/pull/686
* Update numpy.typing import and annotations by elisno in https://github.com/cleanlab/cleanlab/pull/688
* Standardize documentation and simplify code for outliers by DerWeh in https://github.com/cleanlab/cleanlab/pull/689
* Extract function for computing OOD scores from distances by elisno in https://github.com/cleanlab/cleanlab/pull/664
* Introduce Datalab by elisno in https://github.com/cleanlab/cleanlab/pull/614
* Introduce NonIID issue type by jecummin in https://github.com/cleanlab/cleanlab/pull/614
* Further Datalab updates by elisno in https://github.com/cleanlab/cleanlab/pull/680, https://github.com/cleanlab/cleanlab/pull/683, https://github.com/cleanlab/cleanlab/pull/687, https://github.com/cleanlab/cleanlab/pull/690, https://github.com/cleanlab/cleanlab/pull/691, https://github.com/cleanlab/cleanlab/pull/699, https://github.com/cleanlab/cleanlab/pull/705, https://github.com/cleanlab/cleanlab/pull/709, https://github.com/cleanlab/cleanlab/pull/712
* Add descriptions of issues that Datalab can detect by elisno in https://github.com/cleanlab/cleanlab/pull/682
* Datalab IssueManager.get_summary() -> make_summary() in custom issue manager example by jwmueller in https://github.com/cleanlab/cleanlab/pull/692
* Improve NonIID issue checks by elisno in https://github.com/cleanlab/cleanlab/pull/694, https://github.com/cleanlab/cleanlab/pull/707

New Contributors
* Steven-Yiran made their first contribution in https://github.com/cleanlab/cleanlab/pull/662
* DerWeh made their first contribution in https://github.com/cleanlab/cleanlab/pull/689
* jecummin made their first contribution in https://github.com/cleanlab/cleanlab/pull/614

**Full Changelog**: https://github.com/cleanlab/cleanlab/compare/v2.3.1...v2.4.0

2.3.1

This minor release primarily just improves the user experience when encountering various edge-cases in:
- find_label_issues method
- find_overlapping_issues method
- cleanlab.multiannotator module

This release is non-breaking when upgrading from v2.3.0. Two noteworthy updates in the `cleanlab.multiannotator` module include a:
1. better tie-breaking algorithm inside of `get_majority_vote_label()` to avoid diminishing the frequency of rarer classes (this only plays a role when `pred_probs` are not provided).
2. better user-experience for `get_active_learning_scores()` to support scoring only unlabeled data or only labeled data. More of the arguments can now be `None`.

What's Changed
* Readme updates by jwmueller in https://github.com/cleanlab/cleanlab/pull/645, https://github.com/cleanlab/cleanlab/pull/650, https://github.com/cleanlab/cleanlab/pull/656
* describe activelab in the documentation by jwmueller in https://github.com/cleanlab/cleanlab/pull/648
* Added clipping to address issue 639 by ulya-tkch in https://github.com/cleanlab/cleanlab/pull/647
* Fix for not specifying labels in find_overlapping_issues by huiwengoh in https://github.com/cleanlab/cleanlab/pull/652
* Bug fixes + improvements to multiannotator module by huiwengoh in https://github.com/cleanlab/cleanlab/pull/654
* FAQ question/answer on handling label errors in train vs test data by jwmueller in https://github.com/cleanlab/cleanlab/pull/655

**Full Changelog**: https://github.com/cleanlab/cleanlab/compare/v2.3.0...v2.3.1

Page 2 of 4

Releases

Has known vulnerabilities

Previous Next

Cleanlab

Page 2 of 4

2.6.2

2.6.1

2.6.0

2.5.0

2.4.0

2.3.1

Page 2 of 4

Links

Releases