Rulekit

Latest version: v2.1.24.0

Safety actively analyzes 687918 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 8

2.1.22

Restored support of old conclusion syntax (`={NaN}`) in expert regression rules.

2.1.21

* Bugfix in parsing expert survival rules when survival status attribute is nominal.
* Ugly consequence of survival rules (`survival_status = {NaN}`) not printed in log nor needed in the expert knowledge.

2.1.21.0

1. Ability to use user-defined quality measures during rule induction, pruning, and voting phases.

Users can now define custom quality measures function and use them for: growing, pruning and voting. Defining quality measure function is easy and straightforward, see example below.

python
from rulekit.classification import RuleClassifier

def my_induction_measure(p: float, n: float, P: float, N: float) -> float:
do anything you want here and return a single float...
return (p + n) / (P + N)

def my_pruning_measure(p: float, n: float, P: float, N: float) -> float:
return p - n

def my_voting_measure(p: float, n: float, P: float, N: float) -> float:
return (p + 1) / (p + n + 2)

python_clf = RuleClassifier(
induction_measure=my_induction_measure,
pruning_measure=my_pruning_measure,
voting_measure=my_voting_measure,
)

This function was available long ago in the original Java library, but there were some technical problems that prevented its implementation in that package. Now, with the release of RuleKit v2, it is finally available.

> ⚠️ Using this feature comes at a price. Using the original set of quality measures from `rulekit.params.Measures` provides an optimized and much faster implementation of these quality functions in Java. Using a custom Python function **will certainly slow down the model learning process**. For example, learning rules on the Iris dataset using the FullCoverage measure went from 1.8 seconds to 10.9 seconds after switching to using the Python implementation of the same measure.


2. Reading arff files from url via HTTP/HTTPS.

In the last version of the package, a new function for reading arff files was added. It made it possible to read an arff file by accepting the file path or a file-like object as an argument. As of this version, the function also accepts URLs, giving it the ability to read an arff dataset directly from some servers via HTTP/HTTPS.

python
import pandas as pd
from rulekit.arff import read_arff

df: pd.DataFrame = read_arff(
'https://raw.githubusercontent.com/'
'adaa-polsl/RuleKit/refs/heads/master/data/seismic-bumps/'
'seismic-bumps.arff'
)


3. Improves rules API

Access to some basic rule information was often quite cumbersome in earlier versions of this package. For example, there was no easy way to access information about the decision class of a classification rule.

In this version, rule classes and rule sets have been refactored and improved. Below is a list of some operations that are now much easier.

2.1.20

* Nominal attribute values not present in the data not considered in the induction.
* Trimming tokens from arff file metadata before parsing.
* Contrast set fix: group labels written in the log instead of numerical identifiers.

2.1.18

Conditions printed in the order they had been added to the rule.

2.1.18.0

This release mainly focuses on fixing various inconsistencies between this package and the original Java RuleKit v2 library.

1. Add utility function for reading .arff files.

The ARFF file format was originally created by the Machine Learning Project at the University of Waikato's Department of Computer Science for use with Weka machine learning software. This format, once popular, has now become rather niche. However, some older but popular public benchmark datasets are still available as arff files.

Modern Python hovewer lacks a good package for reading such files. Most exsiting examples on the internet are using `scipy.io.arff` package. However, this package has some drawbacks that can be problematic (they certainly were in our own experiments). First of all, it does not read the data as pandas DataFrames. Although the returned data can be easily converted into a DataFrame, it still fails to properly encode string columns, leaving them as bytes. We also encountered problems parsing empty values, especially in numeric columns.

After encountering all these problems and drinking considerable amounts of coffee ☕ to fix all sorts of strange bugs they caused, we decided to add a custom function for reading arff files to this package. It is not a completely new implementation and uses `scipy.io.arff`. It fixes the previously mentioned problems, and also returns a ready-to-use pandas DataFrame compatible with the models available in this package. Example below.

python
import pandas as pd
from rulekit.arff import read_arff

df: pd.DataFrame = read_arff('./tests/additional_resources/cholesterol.arff')

2. Add ability to write verbose rule induction process logs to the file.

The original RuleKit provides detailed logs of the entire rule induction process. Such logs may not be of interest to the average user, but may be of value to others. They can also be helpful in the debugging process (they certainly were for us).

To configure such logs you can use `RuleKit` class:
python
from rulekit import RuleKit

RuleKit.configure_java_logger(
log_file_path='./java.logs',
verbosity_level=1
)
train your model later


3. Add validation of the models parameters configuration.

This package acts as a wrapper for the original RuleKit library written in Java, offering an analogous but more Python-like API. However, this architecture has led to many bugs in the past. Most of them were due to differences between the parameter values of models configured in Python and their values set in Java. In this version, we have added automatic validation, which compares the parameter values configured by the user with those configured in Java and reports the corresponding `rulekit.exceptions.RuleKitMisconfigurationException` exception. However, this exception **should not** occur during normal use of this package and was added mainly to make debugging easier and prevent such bugs in the future.

Fixed issues
* Inconsistent results of induction for survival [22](https://github.com/adaa-polsl/RuleKit-python/issues/22)
* Fixed numerous inconsistencies between this package and the original Java RuleKit v2 library.

Page 2 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.