mlrl-boomer Changelog

0.10.1

A bugfix release that comes with the following changes.

- If the sparse value of a feature matrix is provided to the Python API, it is now properly taken into account when converting into a dense matrix.
- The C++ code is now checked for common issues by applying `cpplint` via Continuous Integration.
- The styling of YAML files is now verified by applying `yamlfix` via Continuous Integration.

0.10.0

A major update to the BOOMER algorithm that introduces the following changes.

{warning}
This release comes with several API changes. For an updated overview of the available parameters and command line arguments, please refer to the [documentation](https://mlrl-boomer.readthedocs.io/en/0.10.0/).

Algorithmic Enhancements

- **The project does now provide a Separate-and-Conquer (SeCo) algorithm** based on traditional rule learning techniques that are particularly well-suited for learning interpretable models.
- **Space-efficient data structures are now used for storing feature values**, depending on whether the feature is numerical, ordinal, nominal, or binary. This also enables to use optimized code paths for dealing with these different types of features.
- **The implementation of feature binning has been reworked** in a way that avoids redundant code and results in a reduction of training times due to the use of the data structures mentioned above.
- **The value to be used for sparse elements of a feature matrix** can now be specified via the C++ or Python API.
- **Nominal and ordinal feature values are now represented as integers** to avoid issues due to limited floating point precision.
- **Safe comparisons of floating point values** are now used to avoid issues due to limited floating point precision.
- **Fundamental data structures for vectors and matrices have been reworked** to ease reusing existing functionality and avoiding redundant code.

Additions to the Command Line API

- **Information about the program can now be printed** via the argument `-v` or `--version`.
- **Data characteristics do now include the number of ordinal features** when printed on the console or written to a file via the command line argument `--print-data-characteristics` or `--store-data-characteristics`.

Bugfixes

- An issue has been fixed that caused the number of numerical and nominal features to be swapped when using the command line arguments `--print-data-characteristics` or `--store-data-characteristics`.
- The correct directory is now used for loading and saving parameter settings when using the command line arguments `--parameter-dir` and `--store-parameters`.

API Changes

- The option `num_threads` of the parameters `--parallel-rule-refinement`, `--parallel-statistic-update` and `--parallel-prediction` has been renamed to `num_preferred_threads`.

Quality-of-Life Improvements

- The documentation has been updated to a more modern theme supporting light and dark theme variants.
- A build option that allows disabling multi-threading support via OpenMP at compile-time has been added.
- The groundwork for GPU support was laid. It can be disabled at compile-time via a build option.
- Added support for unit testing the project's C++ code. Compilation of the tests can be disabled via a build option.
- The Python code is now checked for common issues by applying `pylint` via Continuous Integration.
- The Makefile has been replaced with wrapper scripts triggering a [SCons](https://scons.org/) build.
- Development versions of wheel packages are now regularly built via Continuous Integration, uploaded as artifacts, and published on [Test-PyPI](https://test.pypi.org/).
- Continuous integration is now used to maintain separate branches for major, feature, and bugfix releases and keep them up-to-date.
- The runtime of Continuous Integration jobs has been optimized by running individual steps only if necessary, caching files across subsequent runs, and making use of parallelization.
- When tests are run via Continuous Integration, a summary of the test results is now added to pull requests and GitHub workflows.
- Markdown files are now used for writing the documentation.
- A consistent style is now enforced for Markdown files by applying the tool `mdformat` via Continuous Integration.
- C++ 17 or newer is now required for compiling the project.

0.9.0

A major update to the BOOMER algorithm that introduces the following changes.

{warning}
This release comes with several API changes. For an updated overview of the available parameters and command line arguments, please refer to the [documentation](https://mlrl-boomer.readthedocs.io/en/0.9.0/).

Algorithmic Enhancements

- **Sparse matrices can now be used to store gradients and Hessians** if supported by the loss function. The desired behavior can be specified via a new parameter `--statistic-format`.
- **Rules with partial heads can now be learned** by setting the parameter `--head-type` to the value `partial-fixed`, if the number of predicted labels should be predefined, or `partial-dynamic`, if the subset of predicted labels should be determined dynamically.
- **A beam search can now be used** for the induction of individual rules by setting the parameter `--rule-induction` to the value `top-down-beam-search`.
- **Variants of the squared error loss and squared hinge loss**, which take all labels of an example into account at the same time, can now be used by setting the parameter `--loss` to the value `squared-error-example-wise` or `squared-hinge-example-wise`.
- **Probability estimates can be obtained for each label independently or via marginalization** over the label vectors encountered in the training data by setting the new parameter `--probability-predictor` to the value `label-wise` or `marginalized`.
- **Predictions that maximize the example-wise F1-measure can now be obtained** by setting the parameter `--classification-predictor` to the value `gfm`.
- **Binary predictions can now be derived from probability estimates** by specifying the new option `based_on_probabilities`.
- **Isotonic regression models can now be used** to calibrate marginal and joint probabilities predicted by a model via the new parameters `--marginal-probability-calibration` and `--joint-probability-calibration`.
- **The rules in a previously learned model can now be post-optimized** by reconstructing each one of them in the context of the other rules via the new parameter `--sequential-post-optimization`.
- **Early stopping or post-pruning can now be used** by setting the new parameter `--global-pruning` to the value `pre-pruning` or `post-pruning`.
- **Single labels can now be sampled in a round-robin fashion** by setting the parameter `--feature-sampling` to the new value `round-robin`.
- **A fixed number of trailing features can now be retained** when the parameter `--feature-sampling` is set to the value `without-replacement` by specifying the option `num_retained`.

Additions to the Command Line API

- **Data sets in the MEKA format are now supported.**
- **Certain characteristics of binary predictions can be printed or written to output files** via the new arguments `--print-prediction-characteristics` and `--store-prediction-characteristics`.
- **Unique label vectors contained in the training data can be printed or written to output files** via the new arguments `--print-label-vectors` and `--store-label-vectors`.
- **Models for the calibration of marginal or joint probabilities can be printed or written to output files** via the new arguments `--print-marginal-probability-calibration-model`, `--store-marginal-probability-calibration-model`, `--print-joint-probability-calibration-model` and `--store-joint-probability-calibration-model`.
- **Models can now be evaluated repeatedly, using a subset of their rules with increasing size,** by specifying the argument `--incremental-prediction`.
- **More control of how data is split into training and test sets** is now provided by the argument `--data-split` that replaces the arguments `--folds` and `--current-fold`.
- **Binary labels, scores, or probabilities can now be predicted,** depending on the value of the new argument `--prediction-type`, which can be set to the values `binary`, `scores`, or `probabilities`.
- **Individual evaluation measures can now be enabled or disabled** via additional options that have been added to the arguments `--print-evaluation` and `--store-evaluation`.
- **The presentation of values printed on the console has vastly been improved.** In addition, options for controlling the presentation of values to be printed or written to output files have been added to various command line arguments.

Bugfixes

- The behavior of the parameter `--label-format` has been fixed when set to the value `auto`.
- The behavior of the parameters `--holdout` and `--instance-sampling` has been fixed when set to the value `stratified-label-wise`.
- The behavior of the parameter `--binary-predictor` has been fixed when set to the value `example-wise` and using a model that has been loaded from disk.
- Rules are now guaranteed to not cover more examples than specified via the option `min_coverage`. The option is now also taken into account when using feature binning. Alternatively, the minimum coverage of rules can now also be specified as a fraction via the option `min_support`.

API Changes

- The parameter `--early-stopping` has been replaced with a new parameter `--global-pruning`.
- The parameter `--pruning` has been renamed to `--rule-pruning`.
- The parameter `--classification-predictor` has been renamed to `--binary-predictor`.
- The command line argument `--predict-probabilities` has been replaced with a new argument `--prediction-type`.
- The command line argument `--predicted-label-format` has been renamed to `--prediction-format`.

Quality-of-Life Improvements

- Continuous integration is now used to test the most common functionalities of the BOOMER algorithm and the corresponding command line API.
- Successful generation of the documentation is now tested via Continuous Integration.
- Style definitions for Python and C++ code are now enforced by applying the tools `clang-format`, `yapf`, and `isort` via Continuous Integration.

0.8.2

A bugfix release that solves the following issues:

- Fixed prebuilt packages available at [PyPI](https://pypi.org/project/mlrl-boomer/).
- Fixed output of nominal values when using the option `--print-rules true`.

0.8.1

A bugfix release that solves the following issues:

- Missing feature values are now dealt with correctly when using feature binning.
- A rare issue that may cause segmentation faults when using instance sampling has been fixed.

0.8.0

A major update to the BOOMER algorithm that introduces the following changes.

{warning}
This release comes with changes to the command line API. For an updated overview of the available parameters, please refer to the [documentation](https://mlrl-boomer.readthedocs.io/en/0.8.0/).

- The programmatic C++ API was redesigned for a more convenient configuration of algorithms. This does also drastically reduce the amount of wrapper code that is necessary to access the API from other programming languages and therefore facilitates the support of additional languages in the future.
- An issue that may cause segmentation faults when using stratified sampling methods for the creation of holdout sets has been fixed.
- Pre-built packages for Windows systems are now available at [PyPI](https://pypi.org/project/mlrl-boomer/).
- Pre-built packages for Linux ARM64 systems are now provided.

Mlrl-boomer

Page 2 of 4