Added
* Added support for randomized parameter search to model exploration. [PR 168][pr168]
* Created an `hlink.linking.core.model_metrics` module with functions for computing
metrics on model confusion matrices. Added the F-measure model metric to model
exploration. [PR 180][pr180]
* Added this changelog! [PR 189][pr189]
Changed
* Overhauled the model exploration task to use a nested cross-validation approach.
[PR 169][pr169]
* Changed `hlink.linking.core.classifier` functions to not interact with `threshold`
and `threshold_ratio`. Please ensure that the parameter dictionaries passed to these
functions only contain parameters for the chosen model. [PR 175][pr175]
* Simplified the parameters required for `hlink.linking.core.threshold.predict_using_thresholds`.
Instead of passing the entire `training` configuration section to this function,
you now need only pass `training.decision`. [PR 175][pr175]
* Added a new required `checkpoint_dir` argument to `SparkConnection`, which lets hlink set
different directories for the tmp and checkpoint directories. [PR 182][pr182]
* Swapped to using `tomli` as the default TOML parser. This should fix several issues
with how hlink parses TOML files. `load_conf_file()` provides the `use_legacy_toml_parser`
argument for backwards compatibility if necessary. [PR 185][pr185]
Deprecated
* Deprecated the `training.param_grid` attribute in favor of the new, more flexible
`training.model_parameter_search` table. This is part of supporting the new randomized
parameter search. [PR 168][pr168]
Removed
* Removed functionality for outputting "suspicious" training data from model exploration.
We determined that this is out of the scope of model exploration step 2. This change
greatly simplifies the model exploration code. [PR 178][pr178]
* Removed the deprecated `hlink.linking.transformers.interaction_transformer` module.
This module was deprecated in v3.5.0. Please use
[`pyspark.ml.feature.Interaction`][pyspark-interaction-docs] instead. [PR 184][pr184]
* Removed some alternate configuration syntax which has been deprecated since
v3.0.0. [PR 184][pr184]
* Removed `hlink.scripts.main.load_conf` in favor of a much simpler approach to
finding the configuration file and configuring spark. Please call
`hlink.configs.load_config.load_conf_file` directly instead. `load_conf_file` now
returns both the path to the configuration file and its contents as a mapping. [PR 182][pr182]