Cleanlab

Latest version: v2.7.0

Safety actively analyzes 681866 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 4

2.7.0

Not secure
This release introduces new features and improvements aimed at helping users detect complex dataset issues and improve their ML models' robustness. As always, we maintain backward compatibility, making this release non-breaking when upgrading from v2.6.6. We continue to support Python 3.8-3.11 in this version, but support for Python 3.8 will be dropped in a future minor release.


Introducing Spurious Correlation Detection in Datalab

With this release, `Datalab` now detects **spurious correlations** in image datasets by default, helping users identify potentially misleading patterns that may lead to overfitting or reduced model generalization.

Spurious correlations occur when models pick up on patterns in the data that are coincidental rather than meaningful. For example, a model might incorrectly associate the background color with a particular label, leading to poor generalization on new data. Identifying these correlations helps ensure more reliable models by minimizing the risk of learning from irrelevant or misleading features.

Detecting spurious correlations in image datasets is straightforward:

python
from cleanlab import Datalab

lab = Datalab(data=image_dataset, label_name="label_column", image_key="image_column")

lab.find_issues()

lab.report()


You can find a more detailed [workflow for finding spurious correlations](https://docs.cleanlab.ai/stable/tutorials/datalab/workflows.html#Identify-Spurious-Correlations-in-Image-Datasets) in our documentation.


This new issue type aims to give users deeper insights into their data, enabling more robust model development.


New Tutorial: Improving ML Performance with Train and Test Set Curation

We've introduced a new tutorial that demonstrates how to carefully use cleanlab (via `Datalab`) for both training and test data. This approach helps ensure reliable ML model training and evaluation, particularly for noisy datasets.

You can find this tutorial in our documentation: [Improving ML Performance via Data Curation with Train vs Test Splits](https://docs.cleanlab.ai/stable/tutorials/improving_ml_performance.html).


Other Major Improvements

- **Optimized Internal Functions**: Several internal optimizations have been made, including updates to `clip_noise_rates`, `remove_noise_from_class`, and `clip_values` functions, improving the overall efficiency of cleanlab.
- **Improved Underperforming Group Detection**: Enhanced scoring for all underperforming groups, providing more accurate identification of problematic data subsets.

If you have ideas for new features or notice any bugs, we encourage you to open an Issue or Pull Request on our GitHub repository!



Change Log

Significant changes in this release include:

* Added Spurious Correlation feature by allincowell in https://github.com/cleanlab/cleanlab/pull/1140, https://github.com/cleanlab/cleanlab/pull/1171, https://github.com/cleanlab/cleanlab/pull/1181, https://github.com/cleanlab/cleanlab/pull/1194; elisno in https://github.com/cleanlab/cleanlab/pull/1170, https://github.com/cleanlab/cleanlab/pull/1192, https://github.com/cleanlab/cleanlab/pull/1192, https://github.com/cleanlab/cleanlab/pull/1193, https://github.com/cleanlab/cleanlab/pull/1192, https://github.com/cleanlab/cleanlab/pull/1201; jwmueller in https://github.com/cleanlab/cleanlab/pull/1195, https://github.com/cleanlab/cleanlab/pull/1196
* Added new CLOS train test split tutorial notebook by mturk24 in https://github.com/cleanlab/cleanlab/pull/1071; jwmueller in https://github.com/cleanlab/cleanlab/pull/1178
* Update links to Issue Type Guide in workflows tutorials by elisno in https://github.com/cleanlab/cleanlab/pull/1168
* Optimize internal clip_noise_rates and remove_noise_from_class functions by gogetron in https://github.com/cleanlab/cleanlab/pull/1105
* Optimize internal clip_values function by gogetron in https://github.com/cleanlab/cleanlab/pull/1104
* Move models.fasttext wrapper to examples repo by jwmueller in https://github.com/cleanlab/cleanlab/pull/1173
* Mypy fixes by elisno in https://github.com/cleanlab/cleanlab/pull/1174
* Improve tests in Datalab Quickstart tutorial by allincowell in https://github.com/cleanlab/cleanlab/pull/1166
* Improve docs by mturk24 in https://github.com/cleanlab/cleanlab/pull/1177; jwmueller in https://github.com/cleanlab/cleanlab/pull/1189; dduong1603 in https://github.com/cleanlab/cleanlab/pull/1197; elisno in https://github.com/cleanlab/cleanlab/pull/1204
* Update Studio References by nelsonauner in https://github.com/cleanlab/cleanlab/pull/1182
* Update README by nelsonauner in https://github.com/cleanlab/cleanlab/pull/1188
* Improve cluster score for all underperforming groups by tataganesh in https://github.com/cleanlab/cleanlab/pull/1180
* Improve CI test setup by dduong1603 in https://github.com/cleanlab/cleanlab/pull/1198


New Contributors

- dduong1603 made their first contribution in https://github.com/cleanlab/cleanlab/pull/1197

For a full list of changes, enhancements, and fixes, please refer to the [Full Changelog](https://github.com/cleanlab/cleanlab/compare/v2.6.6...v2.7.0).

2.6.6

Not secure
What's Changed

* Improvements in Issue Type guide by elisno in https://github.com/cleanlab/cleanlab/pull/1100; jwmueller in https://github.com/cleanlab/cleanlab/pull/1136
* Improve docstrings in token_classification/summary.py by gogetron in https://github.com/cleanlab/cleanlab/pull/1094
* Update dictionary for deciding on omitting underperforming_group_check by elisno in https://github.com/cleanlab/cleanlab/pull/1135
* Add notebook with miscellaneous Datalab workflows by elisno in https://github.com/cleanlab/cleanlab/pull/1125, https://github.com/cleanlab/cleanlab/pull/1137https://github.com/cleanlab/cleanlab/pull/1138
* Update datalab report text by jwmueller in https://github.com/cleanlab/cleanlab/pull/1134; elisno in https://github.com/cleanlab/cleanlab/pull/1154
* Update FAQ sections by jwmueller in https://github.com/cleanlab/cleanlab/pull/1139; elisno in https://github.com/cleanlab/cleanlab/pull/1152
* Pin fasttext in CI by elisno in https://github.com/cleanlab/cleanlab/pull/1144
* Improve test setup by elisno in https://github.com/cleanlab/cleanlab/pull/1146
* Update quickstart links that were outdated by jwmueller in https://github.com/cleanlab/cleanlab/pull/1148
* Update knn shapely score computation by elisno in https://github.com/cleanlab/cleanlab/pull/1142
* Refactor KNN graph handling and outlier detection in issue managers by elisno in https://github.com/cleanlab/cleanlab/pull/1155, https://github.com/cleanlab/cleanlab/pull/1163

**Full Changelog**: https://github.com/cleanlab/cleanlab/compare/v2.6.5...v2.6.6

2.6.5

Not secure
What's Changed
* Add end-to-end tests at the end of Datalab quickstart tutorial by allincowell in https://github.com/cleanlab/cleanlab/pull/1118
* Centralize existing functionality for constructing and correcting knn graphs in a separate module by elisno in https://github.com/cleanlab/cleanlab/pull/1117, https://github.com/cleanlab/cleanlab/pull/1119, https://github.com/cleanlab/cleanlab/pull/1129
* Optimize multiannotator.py for performance by gogetron in https://github.com/cleanlab/cleanlab/pull/1077
* Optimize value_counts function for performance improvement with missing classes by gogetron in https://github.com/cleanlab/cleanlab/pull/1073
* Improve test coverage for setting confident joint in `CleanLearning` by elisno in https://github.com/cleanlab/cleanlab/pull/1123
* Switch from np.isnan to pd.isna for null value check by gogetron in https://github.com/cleanlab/cleanlab/pull/1096
* Update pip install instruction in object detection tutorial by elisno in https://github.com/cleanlab/cleanlab/pull/1126
* Refine handling of `underperforming_group` issue type by gogetron in https://github.com/cleanlab/cleanlab/pull/1099
* Improve compatibility with sklearn 1.5 by removing the deprecated `multi_class` argument in LogisticRegression by elisno in https://github.com/cleanlab/cleanlab/pull/1124
* Display exact duplicate sets dynamically in tabular tutorial by nelsonauner in https://github.com/cleanlab/cleanlab/pull/1128

New Contributors
* allincowell made their first contribution in https://github.com/cleanlab/cleanlab/pull/1118
* nelsonauner made their first contribution in https://github.com/cleanlab/cleanlab/pull/1128

**Full Changelog**: https://github.com/cleanlab/cleanlab/compare/v2.6.4...v2.6.5

2.6.4

Not secure
What's Changed

* Various performance optimizations and test improvements by gogetron in https://github.com/cleanlab/cleanlab/pull/1064, https://github.com/cleanlab/cleanlab/pull/1067, https://github.com/cleanlab/cleanlab/pull/1079, https://github.com/cleanlab/cleanlab/pull/1087, https://github.com/cleanlab/cleanlab/pull/1095, https://github.com/cleanlab/cleanlab/pull/1106, https://github.com/cleanlab/cleanlab/pull/1107
* Restructured text and tabular classification tutorials into CleanLear… by mturk24 in https://github.com/cleanlab/cleanlab/pull/1066
* user-facing cleanlab.datavaluation module by coding-famer in https://github.com/cleanlab/cleanlab/pull/1050
* fix typo in datalab issue types by coding-famer in https://github.com/cleanlab/cleanlab/pull/1085
* Add kwargs to functions that call plt.show() by mturk24 in https://github.com/cleanlab/cleanlab/pull/1084; by jwmueller in https://github.com/cleanlab/cleanlab/pull/1088
* update tutorials by jwmueller in https://github.com/cleanlab/cleanlab/pull/1089, https://github.com/cleanlab/cleanlab/pull/1090, https://github.com/cleanlab/cleanlab/pull/1091
* Refine type hints by desboisGIT in https://github.com/cleanlab/cleanlab/pull/1101; by elisno in https://github.com/cleanlab/cleanlab/pull/1086
* Updated datalab issue type description for non iid issue by mturk24 in https://github.com/cleanlab/cleanlab/pull/1102
* Remove unsqueeze call in image tutorial by elisno in https://github.com/cleanlab/cleanlab/pull/1108
* Temporarily Revert to macOS 12 in CI due to Incompatibility with Python 3.8 and 3.9 by elisno in https://github.com/cleanlab/cleanlab/pull/1110
* Fix numerical instability with Euclidean distance metric by elisno in https://github.com/cleanlab/cleanlab/pull/1113
* avoid sensitive divisions by jwmueller in https://github.com/cleanlab/cleanlab/pull/1114; by elisno in https://github.com/cleanlab/cleanlab/pull/1116
* All identical datasets tests by elisno in https://github.com/cleanlab/cleanlab/pull/1115

New Contributors
* gogetron made their first contribution in https://github.com/cleanlab/cleanlab/pull/1064
* desboisGIT made their first contribution in https://github.com/cleanlab/cleanlab/pull/1101

**Full Changelog**: https://github.com/cleanlab/cleanlab/compare/v2.6.3...v2.6.4

2.6.3

Not secure
This release is non-breaking when upgrading from v2.6.2.

What's Changed

* Updated image_key documentation by sanjanag in https://github.com/cleanlab/cleanlab/pull/1048
* Refine Scoring and Enhance Stability for Datasets with Identical Examples by elisno in https://github.com/cleanlab/cleanlab/pull/1056
* Add warning message about TensorFlow compatibility to docs by elisno in https://github.com/cleanlab/cleanlab/pull/1057


**Full Changelog**: https://github.com/cleanlab/cleanlab/compare/v2.6.2...v2.6.3

2.6.2

Not secure
This release is non-breaking when upgrading from v2.6.1.

What's Changed
* Convert DataFrame features to numpy arrays in null value check by elisno in https://github.com/cleanlab/cleanlab/pull/1045


**Full Changelog**: https://github.com/cleanlab/cleanlab/compare/v2.6.1...v2.6.2

Page 1 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.