Hlink

Latest version: v3.7.0

Safety actively analyzes 681812 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

3.7.0

What's Changed
* Add tests to cover several untested sections of code by riley-harper in https://github.com/ipums/hlink/pull/147
* Refactor core.transforms.generate_transforms() for readability and maintainability; improve documentation and type hints by riley-harper in https://github.com/ipums/hlink/pull/148
* Fix tests for Python 3.12 and clarify Python 3.12 support and dependence on PySpark by riley-harper in https://github.com/ipums/hlink/pull/151
* Improve logging by writing to module-level loggers instead of the root logger by riley-harper in https://github.com/ipums/hlink/pull/152
* Support setting the app name via an optional argument in SparkConnection. The default behavior of setting the app name to "linking" is unchanged. By riley-harper in https://github.com/ipums/hlink/pull/156
* Improve model_exploration step 2 terminal output, logging, and documentation to make the step more understandable by riley-harper in https://github.com/ipums/hlink/pull/155


**Full Changelog**: https://github.com/ipums/hlink/compare/v3.6.1...v3.7.0

3.6.1

What's Changed
* Support blocking sections with multiple exploded columns by riley-harper in https://github.com/ipums/hlink/pull/143. This fixes a bug that caused a crash in Matching step 0 - explode.


**Full Changelog**: https://github.com/ipums/hlink/compare/v3.6.0...v3.6.1

3.6.0

What's Changed
* Support OR conditions in blocking by riley-harper in https://github.com/ipums/hlink/pull/138. This new feature supports connecting some or all blocking conditions together with ORs instead of with ANDs. You can read more documentation about it under the "or_group" bullet point [here](https://hlink.docs.ipums.org/config.html#blocking).
* Unskip several skipped tests by riley-harper in https://github.com/ipums/hlink/pull/139. This is a development change that should not affect users.


**Full Changelog**: https://github.com/ipums/hlink/compare/v3.5.5...v3.6.0

3.5.5

What's Changed
* Support a variable number of columns in the array feature selection transform by riley-harper in https://github.com/ipums/hlink/pull/135


**Full Changelog**: https://github.com/ipums/hlink/compare/v3.5.4...v3.5.5

3.5.4

What's Changed
* Document column_mappings transform concat_two_cols by riley-harper in https://github.com/ipums/hlink/pull/126. These new docs are here: https://hlink.docs.ipums.org/column_mappings.html#concat-two-cols.
* Document column mapping overrides by riley-harper in https://github.com/ipums/hlink/pull/129. These can let you read two columns with different names from the two input files into a single hlink column. Check out the documentation at https://hlink.docs.ipums.org/column_mappings.html#advanced-usage and following.
* Fix a bug with the override_column_X attributes in conf_validations.py by riley-harper in https://github.com/ipums/hlink/pull/131. Previously config validation was raising spurious errors because it didn't take override_column_a and override_column_b into account.


**Full Changelog**: https://github.com/ipums/hlink/compare/v3.5.3...v3.5.4

3.5.3

Highlights

In this release we start supporting Python 3.12 and remove the ceiling on most of our dependency versions to support this. We also fix a bug with one-hot encoding and add an additional check to config validation that looks for duplicate output columns for some config sections.

What's Changed
* Refactor to use colorama in a simpler way by jrbalch543 in https://github.com/ipums/hlink/pull/115. User-facing functionality should be unchanged.
* Add checks for duplicated comparison features, feature selection, and column mappings by jrbalch543 in https://github.com/ipums/hlink/pull/113. This will cause a validation error when there are duplicated aliases or output columns for these sections.
* Clean up a couple of core modules by jrbalch543 in https://github.com/ipums/hlink/pull/117. These changes are internal refactoring and don't affect functionality.
* Upgrade dependencies by pinning them more loosely and support Python 3.12 by riley-harper in https://github.com/ipums/hlink/pull/119. This removes the upper limit on almost all of our dependencies so that users can more freely pick versions for themselves. Our tests run on the most recent available versions for each particular version of Python. Unpinning these dependencies allows us to easily support Python 3.12, which we now support and run in CI/CD.
* Update the docs to include Python 3.12 by riley-harper in https://github.com/ipums/hlink/pull/120
* Revert to handleInvalid = "keep" for OneHotEncoder by riley-harper in https://github.com/ipums/hlink/pull/121. This is a bug that we introduced in the last release. Although it's not common, it does sometimes happen that our training data doesn't cover every category present in the matching data. We would rather silently continue and ignore these cases by giving them a coefficient of 0 than error out on them.
* Put the config file name in the script prompt by riley-harper in https://github.com/ipums/hlink/pull/123. This is a small quality of life feature that makes it easier to remember which config file you're running during long hlink runs.


**Full Changelog**: https://github.com/ipums/hlink/compare/v3.5.2...v3.5.3

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.