More changes have been made for survival rules.
First, there is a new class `rulekit.kaplan_meier.KaplanMeierEstimator`, which represents Kaplan-Meier estimator rules. In the future, prediction arrays for survival problems will probably be moved from dictionary arrays to arrays of such objects, but this would be a breaking change unfortunately
In addition, one can now easily access the Kaplan-Meier curve of the entire training dataset using the `rulekit.survival.SurvivalRules.get_train_set_kaplan_meier` method.
Such curves can be easily plotted using the charting package of your choice.
python
import pandas as pd
import matplotlib.pyplot as plt
from rulekit.arff import read_arff
from rulekit.survival import SurvivalRules
from rulekit.rules import RuleSet, SurvivalRule
from rulekit.kaplan_meier import KaplanMeierEstimator this is a new class
DATASET_URL: str = (
'https://raw.githubusercontent.com/'
'adaa-polsl/RuleKit/master/data/bmt/'
'bmt.arff'
)
df: pd.DataFrame = read_arff(DATASET_URL)
X, y = df.drop('survival_status', axis=1), df['survival_status']
surv = SurvivalRules(survival_time_attr='survival_time')
surv.fit(X, y)
ruleset: RuleSet[SurvivalRule] = reg.model
rule: SurvivalRule = ruleset.rules[0]
you can now easily access Kaplan-Meier estimator of the rules
rule_estimator: KaplanMeierEstimator = rule.kaplan_meier_estimator
plt.step(
rule_estimator.times,
rule_estimator.probabilities,
label='First rule'
)
you can also access training dataset Kaplan-Meier estimator easily
train_dataset_estimator: KaplanMeierEstimator = surv.get_train_set_kaplan_meier()
plt.step(
train_dataset_estimator.times,
train_dataset_estimator.probabilities,
label='Training dataset'
)
plt.legend(title='Kaplan-Meier curves:')
4. Changes in expert rules induction for regression and survival `❗BREAKING CHANGES`
> Note that those changes will likely be reverted on the next version and are caused by a known bug in the original RuleKit library. Fixing it is beyond the scope of this package, which is merely a wrapper for it.
Since this version, there has been a change in the way expert rules and conditions for regression and survival problems are communicated. All you have to do is remove conclusion part of those rules (everything after **THEN**).
Expert rules before:
python
expert_rules = [
(
'rule-0',
'IF [[CD34kgx10d6 = (-inf, 10.0)]] AND [[extcGvHD = {0}]] THEN survival_status = {NaN}'
)
]
expert_preferred_conditions = [
(
'attr-preferred-0',
'inf: IF [CD34kgx10d6 = Any] THEN survival_status = {NaN}'
)
]
expert_forbidden_conditions = [
('attr-forbidden-0', 'IF [ANCrecovery = Any] THEN survival_status = {NaN}')
]
And now:
python
expert_rules = [
(
'rule-0',
'IF [[CD34kgx10d6 = (-inf, 10.0)]] AND [[extcGvHD = {0}]] THEN'
)
]
expert_preferred_conditions = [
(
'attr-preferred-0',
'inf: IF [CD34kgx10d6 = Any] THEN'
)
]
expert_forbidden_conditions = [
('attr-forbidden-0', 'IF [ANCrecovery = Any] THEN')
]
Other changes
* Fix expert rules parsing.
* Conditions printed in the order they had been added to the rule.
* Fixed bug when using `sklearn.base.clone` function with RuleKit model classes.
* Update tutorials in the documentation.