Sklearn2pmml

Latest version: v0.109.0

Safety actively analyzes 641134 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

0.108.0

Breaking changes

None.

New features

* Added support for [`interpret.glassbox.ClassificationTree`](https://interpret.ml/docs/python/api/ClassificationTree.html) and [`interpret.glassbox.RegressionTree`](https://interpret.ml/docs/python/api/RegressionTree.html) classes.

* Added support for [`interpret.glassbox.LinearRegression`](https://interpret.ml/docs/python/api/LinearRegression.html) and [`interpret.glassbox.LogisticRegression`](https://interpret.ml/docs/python/api/LogisticRegression.html) classes.

* Added support for [`interpret.glassbox.ExplainableBoostingClassifier`](https://interpret.ml/docs/python/api/ExplainableBoostingClassifier.html) and [`interpret.glassbox.ExplainableBoostingRegressor`](https://interpret.ml/docs/python/api/ExplainableBoostingRegressor.html) classes.

See [InterpretML-536](https://github.com/interpretml/interpret/issues/536)

Minor improvements and fixes

* Ensured compatibility with Scikit-Learn 1.4.2.

0.107.1

Breaking changes

None.

New features

* Added support for [`H2OExtendedIsolationForestEstimator`](https://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/modeling.html#h2oextendedisolationforestestimator) class.

This class implements the isolation forest algorithm using oblique tree models.
It is claimed to outperform the [`H2OIsolationForestEstimator`](https://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/modeling.html#h2oisolationforestestimator) class, which does the same using plain (ie. non-oblique) tree models.

* Made `lightgbm.Booster` class directly exportable to PMML.

The SkLearn2PMML package now supports both LightGBM [Training API](https://lightgbm.readthedocs.io/en/latest/Python-API.html#training-api) and [Scikit-Learn API](https://lightgbm.readthedocs.io/en/latest/Python-API.html#scikit-learn-api):

python
from lightgbm import train, Dataset
from sklearn2pmml import sklearn2pmml

ds = Dataset(data = X, label = y)

booster = train(params = {...}, train_set = ds)

sklearn2pmml(booster, "LightGBM.pmml")


* Made `xgboost.Booster` class directly exportable to PMML.

The SkLearn2PMML package now supports both XGBoost [Learning API](https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.training) and [Scikit-Learn API](https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn):

python
from xgboost import train, DMatrix
from sklearn2pmml import sklearn2pmml

dmatrix = DMatrix(data = X, label = y)

booster = train(params = {...}, dtrain = dmatrix)

sklearn2pmml(booster, "XGBoost.pmml")


* Added `xgboost.Booster.fmap` attribute.

This attribute allows overriding the embedded feature map with a user-defined feature map.

The main use case is refining the category levels of categorical levels.

A suitable feature map object can be generated from the training dataset using the `sklearn2pmml.xgboost.make_feature_map(X)` utility function:

python
from xgboost import train, DMatrix
from sklearn2pmml.xgboost import make_feature_map

Enable categorical features
dmatrix = DMatrix(X, label = y, enable_categorical = True)

Generate a feature map with detailed description of all continuous and categorical features in the dataset
fmap = make_feature_map(X)

booster = train(params = {...}, dtrain = dmatrix)
booster.fmap = fmap


* Added `input_float` conversion option for XGBoost models.

Minor improvements and fixes

None.

0.107.0

Breaking changes

None.

New features

* Added support for [`sktree.ensemble.ExtendedIsolationForest`](https://docs.neurodata.io/scikit-tree/dev/generated/sktree.ExtendedIsolationForest.html) class.

For example, training and exporting an `ExtendedIsolationForest` outlier detector into a PMML document:

python
from sklearn.datasets import load_iris
from sktree.ensemble import ExtendedIsolationForest
from sklearn2pmml import sklearn2pmml

iris_X, iris_y = load_iris(return_X_y = True, as_frame = True)

eif = ExtendedIsolationForest(n_estimators = 13)
eif.fit(iris_X)

sklearn2pmml(eif, "ExtendedIsolationForestIris.pmml")


See [SKTree-255](https://github.com/neurodata/scikit-tree/issues/255)

* Added support for [`sktree.ensemble.ObliqueRandomForestClassifier`](https://docs.neurodata.io/scikit-tree/dev/generated/sktree.ObliqueRandomForestClassifier.html) and [`sktree.ensemble.ObliqueRandomForestRegressor`](https://docs.neurodata.io/scikit-tree/dev/generated/sktree.ObliqueRandomForestRegressor.html) classes.

* Added support for [`sktree.tree.ObliqueDecisionTreeClassifier`](https://docs.neurodata.io/scikit-tree/dev/generated/sktree.tree.ObliqueDecisionTreeClassifier.html) and [`sktree.tree.ObliqueDecisionTreeRegressor`](https://docs.neurodata.io/scikit-tree/dev/generated/sktree.tree.ObliqueDecisionTreeRegressor.html) classes.

Minor improvements and fixes

None.

0.106.0

Breaking changes

* Upgraded JPMML-SkLearn library from 1.7(.56) to 1.8(.0).

This is a major API upgrade.
The 1.8.X development branch is already source and binary incompatible with earlier 1.5.X through 1.7.X development branches, with more breaking changes to follow suit.

Custom SkLearn2PMML plugins would need to be upgraded and rebuilt.

New features

None.

Minor improvements and fixes

* Ensured compatibility with Python 3.12.

* Ensured compatibility with Dill 0.3.8.

0.105.2

Breaking changes

None.

New features

None.

Minor improvements and fixes

* Improved support for categorical encoding over mixed datatype column sets.

Scikit-Learn transformers such as `OneHotEncoder`, `OrdinalEncoder` and `TargetEncoder` can be applied to several columns in one go.
Previously it was assumed that all columns shared the same data type. If that was assumption was violated in practice, they were all force cast to the `string` data type.

The JPMML-SkLearn library now detects and maintains the data type on a single column basis.

* Made Category-Encoders classes directly exportable to PMML.

For example, training and exporting a `BaseNEncoder` transformer into a PMML document for manual analysis and interpretation purposes:

python
from category_encoders import BaseNEncoder
from sklearn2pmml import sklearn2pmml

transformer = BaseNEncoder(base = 3)
transformer.fit(X, y = None)

sklearn2pmml(transformer, "Base3Encoder.pmml")


* Fixed support for `(category_encoders.utils.)BaseEncoder.feature_names_in_` attribute.

According to [SLEP007](https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep007/proposal.html), the value of a `feature_names_in_` attribute should be an array of strings.

Category-Encoders transformers are using a list of strings instead.

* Refactored `ExpressionClassifier` and `ExpressionRegressor` constructors.

The evaluatable object can now also be a string literal.

0.105.1

Breaking changes

None.

New features

* Added support for [`sklearn.preprocessing.TargetEncoder`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.TargetEncoder.html) class.

* Added support for [`sklearn.preprocessing.SplineTransformer`](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.SplineTransformer.html) class.

The `SplineTransformer` class computes a B-spline for a feature, which is then used to expand the feature into new features that correspond to B-spline basis elements.

This class is not suitable for simple feature and prediction scaling purposes (eg. calibration of computer probabilities).
Consider using the `sklearn2pmml.preprocessing.BSplineTransformer` class in such a situation.

* Added support for [`statsmodels.api.QuantReg`](https://www.statsmodels.org/dev/generated/statsmodels.regression.quantile_regression.QuantReg.html) class.

* Added `input_float` conversion option.

Scikit-Learn tree and tree ensemble models prepare their inputs by first casting them to `(numpy.)float32`, and then to `(numpy.)float64` (exactly so, even if the input value already happened to be of `(numpy.)float64` data type).

PMML does not provide effective means for implementing "chained casts"; the chain must be broken down into elementary cast operations, each of which is represented using a standalone `DerivedField` element.
For example, preparing the "Sepal.Length" field of the iris dataset:

xml
<PMML>
<DataDictionary>
<DataField name="Sepal.Length" optype="continuous" dataType="double">
<Interval closure="closedClosed" leftMargin="4.3" rightMargin="7.9"/>
</DataField>
</DataDictionary>
<TransformationDictionary>
<DerivedField name="float(Sepal.Length)" optype="continuous" dataType="float">
<FieldRef field="Sepal.Length"/>
</DerivedField>
<DerivedField name="double(float(Sepal.Length))" optype="continuous" dataType="double">
<FieldRef field="float(Sepal.Length)"/>
</DerivedField>
</TransformationDictionary>
</PMML>


Activating the `input_float` conversion option:

python
pipeline = PMMLPipeline([
("classifier", DecisionTreeClassifier())
])
pipeline.fit(iris_X, iris_y)

Default mode
pipeline.configure(input_float = False)
sklearn2pmml("DecisionTree-default.pmml")

"Input float" mode
pipeline.configure(input_float = True)
sklearn2pmml("DecisionTree-input_float.pmml")


This conversion option updates the data type of the "Sepal.Length" data field from `double` to `float`, thereby eliminating the need for the first `DerivedField` element of the two:

xml
<PMML>
<DataDictionary>
<DataField name="Sepal.Length" optype="continuous" dataType="float">
<Interval closure="closedClosed" leftMargin="4.300000190734863" rightMargin="7.900000095367432"/>
</DataField>
</DataDictionary>
<TransformationDictionary>
<DerivedField name="double(Sepal.Length)" optype="continuous" dataType="double">
<FieldRef field="Sepal.Length"/>
</DerivedField>
</TransformationDictionary>
</PMML>


Changing the data type of a field may have side effects if the field contributes to more than one feature.
The effectiveness and safety of configuration options should be verified by integration testing.

* Added `H2OEstimator.pmml_classes_` attribute.

This attribute allows customizing target category levels.
It comes in handly when working with ordinal targets, where the H2O.ai framework requires that target category levels are encoded from their original representation to integer index representation.

A fitted H2O.ai ordinal classifier predicts integer indices, which must be manually decoded in the application layer.
The JPMML-SkLearn library is able to "erase" this encode-decode helper step from the workflow, resulting in a clean and efficient PMML document:

python
ordinal_classifier = H2OGeneralizedLinearEstimator(family = "ordinal")
ordinal_classifier.fit(...)

Customize target category levels
Note that the default lexicographic ordering of labels is different from their intended ordering
ordinal_classifier.pmml_classes_ = ["bad", "poor", "fair", "good", "excellent"]

sklearn2pmml(ordinal_classifier, "OrdinalClassifier.pmml")


Minor improvements and fixes

* Fixed the categorical encoding of missing values.

This bug manifested itself when the input column was mixing different data type values.
For example, a sparse string column, where non-missing values are strings, and missing values are floating-point `numpy.NaN` values.

Scikit-Learn documentation warns against mixing string and numeric values within a single column, but it can happen inadvertently when reading a sparse dataset into a Pandas' DataFrame using standard library functions (eg. the [`pandas.read_csv()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) function).

* Added Pandas to package dependencies.

See [SkLearn2PMML-418](https://github.com/jpmml/sklearn2pmml/issues/418)

* Ensured compatibility with H2O.ai 3.46.0.1.

* Ensured compatibility with BorutaPy 0.3.post0 (92e4b4e).

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.