Hlink

Latest version: v3.8.0

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 4

4.0.0

Added

* Added support for randomized parameter search to model exploration. [PR 168][pr168]
* Created an `hlink.linking.core.model_metrics` module with functions for computing
metrics on model confusion matrices. Added the F-measure model metric to model
exploration. [PR 180][pr180]
* Added this changelog! [PR 189][pr189]

Changed

* Overhauled the model exploration task to use a nested cross-validation approach.
[PR 169][pr169]
* Changed `hlink.linking.core.classifier` functions to not interact with `threshold`
and `threshold_ratio`. Please ensure that the parameter dictionaries passed to these
functions only contain parameters for the chosen model. [PR 175][pr175]
* Simplified the parameters required for `hlink.linking.core.threshold.predict_using_thresholds`.
Instead of passing the entire `training` configuration section to this function,
you now need only pass `training.decision`. [PR 175][pr175]
* Added a new required `checkpoint_dir` argument to `SparkConnection`, which lets hlink set
different directories for the tmp and checkpoint directories. [PR 182][pr182]
* Swapped to using `tomli` as the default TOML parser. This should fix several issues
with how hlink parses TOML files. `load_conf_file()` provides the `use_legacy_toml_parser`
argument for backwards compatibility if necessary. [PR 185][pr185]

Deprecated

* Deprecated the `training.param_grid` attribute in favor of the new, more flexible
`training.model_parameter_search` table. This is part of supporting the new randomized
parameter search. [PR 168][pr168]

Removed

* Removed functionality for outputting "suspicious" training data from model exploration.
We determined that this is out of the scope of model exploration step 2. This change
greatly simplifies the model exploration code. [PR 178][pr178]
* Removed the deprecated `hlink.linking.transformers.interaction_transformer` module.
This module was deprecated in v3.5.0. Please use
[`pyspark.ml.feature.Interaction`][pyspark-interaction-docs] instead. [PR 184][pr184]
* Removed some alternate configuration syntax which has been deprecated since
v3.0.0. [PR 184][pr184]
* Removed `hlink.scripts.main.load_conf` in favor of a much simpler approach to
finding the configuration file and configuring spark. Please call
`hlink.configs.load_config.load_conf_file` directly instead. `load_conf_file` now
returns both the path to the configuration file and its contents as a mapping. [PR 182][pr182]

3.8.0

Added

* Added optional support for the XGBoost and LightGBM gradient boosting
machine learning libraries. You can find documentation on how to use these libraries
[here][gradient-descent-ml-docs]. [PR 165][pr165]
* Added a new `hlink.linking.transformers.RenameVectorAttributes` transformer which
can rename the attributes or "slots" of Spark vector columns. [PR 165][pr165]

Fixed

* Corrected misleading documentation for comparisons, which are not the same thing
as comparison features. You can find the new documentation [here][comparison-docs]. [PR 159][pr159]
* Corrected the documentation for substitution files, which had the meaning of the
columns backwards. [PR 166][pr166]

3.7.0

Added

* Added an optional argument to `SparkConnection` to allow setting a custom Spark
app name. The default is still to set the app name to "linking". [PR 156][pr156]

Changed

* Improved model exploration step 2's terminal output, logging, and documentation
to make the step easier to work with. [PR 155][pr155]

Fixed

* Updated all modules to log to module-level loggers instead of the root logger. This gives
users of the library more control over filtering logs from hlink. [PR 152][pr152]

3.6.1

Fixed

* Fixed a crash in matching step 0 triggered when there were multiple exploded columns
in the blocking section. Multiple exploded columns are now supported. [PR 143][pr143]

3.6.0

Added

* Added OR conditions in blocking. This new feature supports connecting some or
all blocking conditions together with ORs instead of ANDs. Note that using many ORs in blocking
may have negative performance implications for large datasets since it increases
the size of the blocks and makes each block more difficult to compute. You can find
documentation on OR blocking conditions under the `or_group` bullet point [here][or-groups-docs].
[PR 138][pr138]

3.5.5

Added

* Added support for a variable number of columns in the array feature selection
transform, instead of forcing it to use exactly 2 columns. [PR 135][pr135]

Page 1 of 4

Releases

Has known vulnerabilities

Hlink

Page 1 of 4

4.0.0

3.8.0

3.7.0

3.6.1

3.6.0

3.5.5

Page 1 of 4

Links

Releases