1. Ability to use user-defined quality measures during rule induction, pruning, and voting phases.
Users can now define custom quality measures function and use them for: growing, pruning and voting. Defining quality measure function is easy and straightforward, see example below.
python
from rulekit.classification import RuleClassifier
def my_induction_measure(p: float, n: float, P: float, N: float) -> float:
do anything you want here and return a single float...
return (p + n) / (P + N)
def my_pruning_measure(p: float, n: float, P: float, N: float) -> float:
return p - n
def my_voting_measure(p: float, n: float, P: float, N: float) -> float:
return (p + 1) / (p + n + 2)
python_clf = RuleClassifier(
induction_measure=my_induction_measure,
pruning_measure=my_pruning_measure,
voting_measure=my_voting_measure,
)
This function was available long ago in the original Java library, but there were some technical problems that prevented its implementation in that package. Now, with the release of RuleKit v2, it is finally available.
> ⚠️ Using this feature comes at a price. Using the original set of quality measures from `rulekit.params.Measures` provides an optimized and much faster implementation of these quality functions in Java. Using a custom Python function **will certainly slow down the model learning process**. For example, learning rules on the Iris dataset using the FullCoverage measure went from 1.8 seconds to 10.9 seconds after switching to using the Python implementation of the same measure.
2. Reading arff files from url via HTTP/HTTPS.
In the last version of the package, a new function for reading arff files was added. It made it possible to read an arff file by accepting the file path or a file-like object as an argument. As of this version, the function also accepts URLs, giving it the ability to read an arff dataset directly from some servers via HTTP/HTTPS.
python
import pandas as pd
from rulekit.arff import read_arff
df: pd.DataFrame = read_arff(
'https://raw.githubusercontent.com/'
'adaa-polsl/RuleKit/refs/heads/master/data/seismic-bumps/'
'seismic-bumps.arff'
)
3. Improves rules API
Access to some basic rule information was often quite cumbersome in earlier versions of this package. For example, there was no easy way to access information about the decision class of a classification rule.
In this version, rule classes and rule sets have been refactored and improved. Below is a list of some operations that are now much easier.