Redflag

Latest version: v0.5.0

Safety actively analyzes 624001 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 3

0.5.0

- This release makes more changes to the tests and documentation in reponse to the review process for [the submission](https://joss.theoj.org/papers/e1ca575ec0c5344144f87176539ef547) to JOSS (see below).
- In particular, see the following issue: [97](https://github.com/scienxlab/redflag/issues/97)
- Changed the method of handling dynamic versioning. For now the package `__version__` attribute is still defined, but it is deprecated and will be removed in `0.6.0`. Use `from importlib.metadata.version('redflag')` to get the version information instead.
- Changed the default `get_outliers()` method from isolation forest (`'iso'`) to Mahalanobis (`'mah'`) to match other functions, eg `has_outliers()` and the `sklearn` pipeline object.
- Updated `actions/setup-python` to use v5.

0.4.2

- This is a minor release making changes to the tests and documentation in reponse to the review process for [a submission](https://joss.theoj.org/papers/e1ca575ec0c5344144f87176539ef547) to [The Journal of Open Source Software](https://joss.theoj.org) (JOSS).
- See the following issues: [89](https://github.com/scienxlab/redflag/issues/89), [#90](https://github.com/scienxlab/redflag/issues/90), [#91](https://github.com/scienxlab/redflag/issues/91), [#92](https://github.com/scienxlab/redflag/issues/92), [#93](https://github.com/scienxlab/redflag/issues/93), [#94](https://github.com/scienxlab/redflag/issues/94) and [#95](https://github.com/scienxlab/redflag/issues/95).
- Now building and testing on Windows and MacOS as well as Linux.
- Python version `3.12` added to package classifiers
- Python version `3.12` tested during CI

0.4.1

- This is a minor release intended to preview new `pandas`-related features for version 0.5.0.
- Added another `pandas` Series accessor, `is_imbalanced()`.
- Added two `pandas` DataFrame accessors, `feature_importances()` and `correlation_detector()`. These are experimental features.

0.4.0

- `redflag` can now be installed by the `conda` package and environment manager. To do so, use `conda install -c conda-forge redflag`.
- All of the `sklearn` components can now be instantiated with `warn=False` in order to trigger a `ValueException` instead of a warning. This allows you to build pipelines that will break if a detector is triggered.
- Added `redflag.target.is_ordered()` to check if a single-label categorical target is ordered in some way. The test uses a Markov chain analysis, applying chi-squared test to the transition matrix. In general, the Boolean result should only be used on targets with several classes, perhaps at least 10. Below that, it seems to give a lot of false positives.
- You can now pass `groups` to `redflag.distributions.is_multimodal()`. If present, the modality will be checked for each group, returning a Boolean array of values (one for each group). This allows you to check a feature partitioned by target class, for example.
- Added `redflag.sklearn.MultimodalityDetector` to provide a way to check for multimodal features. If `y` is passed and is categorical, it will be used to partition the data and modality will be checked for each class.
- Added `redflag.sklearn.InsufficientDataDetector` which checks that there are at least M<sup>2</sup> records (rows in `X`), where M is the number of features (i.e. columns) in `X`.
- Removed `RegressionMultimodalDetector`. Use `MultimodalDetector` instead.

0.3.0

- Added some accessors to give access to `redflag` functions directly from `pandas.Series` objects, via an 'accessor'. For example, for a Series `s`, one can call `minority_classes = s.redflag.minority_classes()` instead of `redflag.minority_classes(s)`. Other functions include `imbalance_degree()`, `dummy_scores()` (see below). Probably not very useful yet, but future releases will add some reporting functions that wrap multiple Redflag functions. **This is an experimental feature and subject to change.**
- Added a Series accessor `report()` to perform a range of tests and make a small text report suitable for printing. Access for a Series `s` like `s.redflag.report()`. **This is an experimental feature and subject to change.**
- Added new documentation page for the Pandas accessor.
- Added `redflag.target.dummy_classification_scores()`, `redflag.target.dummy_regression_scores()`, which train a dummy (i.e. naive) model and compute various relevant scores (MSE and R2 for regression, F1 and ROC-AUC for classification tasks). Additionally, both `most_frequent` and `stratified` strategies are tested for classification tasks; only the `mean` strategy is employed for regression tasks. The helper function `redflag.target.dummy_scores()` tries to guess what kind of task suits the data and calls the appropriate function.
- Moved `redflag.target.update_p()` to `redflag.utils`.
- Added `is_imbalanced()` to return a Boolean depending on a threshold of imbalance degree. Default threshold is 0.5 but the best value is up for debate.
- Removed `utils.has_low_distance_stdev`.

0.2.0

- Moved to something more closely resembling semantic versioning, which is the main reason this is version 0.2.0.
- Builds and tests on Python 3.11 have been successful, so now supporting this version.
- Added custom 'alarm' `Detector`, which can be instantiated with a function and a warning to emit when the function returns True for a 1D array. You can easily write your own detectors with this class.
- Added `make_detector_pipeline()` which can take sequences of functions and warnings (or a mapping of functions to warnings) and returns a `scikit-learn.pipeline.Pipeline` containing a `Detector` for each function.
- Added `RegressionMultimodalDetector` to allow detection of non-unimodal distributions in features, when considered across the entire dataset. (Coming soon, a similar detector for classification tasks that will partition the data by class.)
- Redefined `is_standardized` (deprecated) as `is_standard_normal`, which implements the Kolmogorov–Smirnov test. It seems more reliable than assuming the data will have a mean of almost exactly 0 and standard deviation of exactly 1, when all we really care about is that the feature is roughly normal.
- Changed the wording slightly in the existing detector warning messages.
- No longer warning if `y` is `None` in, eg, `ImportanceDetector`, since you most likely know this.
- Some changes to `ImportanceDetector`. It now uses KNN estimators instead of SVMs as the third measure of importance; the SVMs were too unstable, causing numerical issues. It also now requires that the number of important features is less than the total number of features to be triggered. So if you have 2 features and both are important, it does not trigger.
- Improved `is_continuous()` which was erroneously classifying integer arrays with many consecutive values as non-continuous.
- Note that `wasserstein` no longer checks that the data are standardized; this check will probably return in the future, however.
- Added a `Tutorial.ipynb` notebook to the docs.
- Added a **Copy** button to code blocks in the docs.

Page 1 of 3

Releases

Has known vulnerabilities

Redflag

Page 1 of 3

0.5.0

0.4.2

0.4.1

0.4.0

0.3.0

0.2.0

Page 1 of 3

Links

Releases