Hlink

Latest version: v3.8.0

Safety actively analyzes 706267 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 4

4.0.0

This pre-release has upcoming changes for version 4 of hlink. Since this includes breaking changes and an overhaul of the model exploration task, we'd like to test it out a bit before creating a full release. Part of the work yet to be done is documentation and code cleanup. The documentation for these changes and new features is lacking so far. Here is a preview of the version 4 highlights (so far!):

* Completely overhauled the model exploration task, switching to a nested cross-validation algorithm.
* Added support for a third strategy for generating models to test in model exploration. Along with "explicit" (take exactly what's in `training.model_parameters`) and grid search, there is now randomized search. Randomized search takes a certain number of samples from a distribution defined in `training.model_parameters`.
* Added the F-measure metric to the model exploration output, and simplified the output so that it always has the same columns.
* Removed the `training.output_suspicious_TD` configuration option because it was rarely used and presented code and performance issues. Removing `output_suspicious_TD` makes the model exploration code more maintainable and helps it run more quickly.
* Disentangled two core modules (`classifier` and `pipeline`) from the configuration format by changing the arguments to a couple of functions. This should help separate those concerns more neatly and make changes to the configuration easier if we end up doing that in the future.
* Changed `SparkConnection` to require a `checkpoint_dir` argument, which fixes a bug related to Spark configuration.

4.0.0a1

3.8.0

What's Changed

* Added optional support for two new gradient boosting ML libraries: XGBoost and LightGBM. You can read more about these libraries and how to install them with their dependencies in the docs [here](https://hlink.docs.ipums.org/models.html). PR https://github.com/ipums/hlink/pull/165
* Added a new `hlink.linking.transformers.RenameVectorAttributes` transformer which can rename the attributes or "slots" of Spark vector columns. Hlink uses this to support LightGBM, which disallows certain characters in its feature names. PR https://github.com/ipums/hlink/pull/165
* Documented comparisons, which are not the same as comparison features. Previously the documentation was misleading and seemed to indicate that these were the same thing. PR https://github.com/ipums/hlink/pull/159
* Fixed a bug in the substitution file documentation. The documentation had the meaning of the substitution file columns flip-flopped, which was confusing. PR https://github.com/ipums/hlink/pull/166


Developer-Facing Changes

* Updated Sphinx to 8.1.3 and fixed two Sphinx build warnings. PR https://github.com/ipums/hlink/pull/159
* Updated CI/CD to automatically run only on PRs and on pushes to main. You can also now manually trigger a CI/CD run from the Actions tab in GitHub. Also removed the custom "quickcheck" pytest marker in favor of using `pytest -k` and removed flake8 from CI/CD because it kept causing more trouble than it was worth. PR https://github.com/ipums/hlink/pull/164

**Full Changelog**: https://github.com/ipums/hlink/compare/v3.7.0...v3.8.0

3.7.0

What's Changed
* Add tests to cover several untested sections of code by riley-harper in https://github.com/ipums/hlink/pull/147
* Refactor core.transforms.generate_transforms() for readability and maintainability; improve documentation and type hints by riley-harper in https://github.com/ipums/hlink/pull/148
* Fix tests for Python 3.12 and clarify Python 3.12 support and dependence on PySpark by riley-harper in https://github.com/ipums/hlink/pull/151
* Improve logging by writing to module-level loggers instead of the root logger by riley-harper in https://github.com/ipums/hlink/pull/152
* Support setting the app name via an optional argument in SparkConnection. The default behavior of setting the app name to "linking" is unchanged. By riley-harper in https://github.com/ipums/hlink/pull/156
* Improve model_exploration step 2 terminal output, logging, and documentation to make the step more understandable by riley-harper in https://github.com/ipums/hlink/pull/155


**Full Changelog**: https://github.com/ipums/hlink/compare/v3.6.1...v3.7.0

3.6.1

What's Changed
* Support blocking sections with multiple exploded columns by riley-harper in https://github.com/ipums/hlink/pull/143. This fixes a bug that caused a crash in Matching step 0 - explode.


**Full Changelog**: https://github.com/ipums/hlink/compare/v3.6.0...v3.6.1

3.6.0

What's Changed
* Support OR conditions in blocking by riley-harper in https://github.com/ipums/hlink/pull/138. This new feature supports connecting some or all blocking conditions together with ORs instead of with ANDs. You can read more documentation about it under the "or_group" bullet point [here](https://hlink.docs.ipums.org/config.html#blocking).
* Unskip several skipped tests by riley-harper in https://github.com/ipums/hlink/pull/139. This is a development change that should not affect users.


**Full Changelog**: https://github.com/ipums/hlink/compare/v3.5.5...v3.6.0

Page 1 of 4

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.