Overview
This release contains several new features like separate log files for each run, logging user input, and a loosening of production dependency requirements. It also contains an important bug fix for Jaro-Winkler scores on blank names and many other smaller enhancements.
Changes
- Started writing to a unique log file for each hlink script run. The name of the log file is `"{config_name}-{session_id}.log"`, where `session_id` is a UUID uniquely generated for the particular run of the script.
- Started logging user input in the main loop. This helps give more context to errors and other logging information.
- Loosened production dependency requirements so that they are not pinned to particular patch versions which may quickly become out of date. Adjusted some development dependency requirements.
- Fixed a bug where the Scala `jw` user-defined function returned a similarity of 1.0 for two empty strings. It now returns 0.0.
- Added syntax highlighting to the TOML example config file in the README (thanks bollwyvl).
- Documented some previously undocumented comparison types: `not_zero_and_not_equals`, `present_and_matching_categorical`, `caution_comp_3_012`, `caution_comp_4_012`, `sql_condition`, `present_and_equal_categorical_in_universe`.
- Updated documentation for a few more comparison types: `caution_comp_3`, `caution_comp_4`, `not_zero_and_not_equals`.
- Updated the Introduction and Installation documentation pages to make them more reader friendly and helpful.
- Updated the tutorial in examples/tutorial and added some small datasets so that it can be run for real. It can now be run with the commands
$ cd examples/tutorial
$ python tutorial.py
- Updated and added type hints for the following classes and modules: `Table`, `LinkRun`, `LinkTask`, `LinkStep`, `linking.util`, `configs.load_config`.
Developer-Facing Changes
- Updated developer instructions for generating the Sphinx docs, adding some more context and tips.
- Renamed some private functions and methods to use a single leading underscore instead of two leading underscores. This should complete the transition from two leading underscores to one leading underscore.
- Allowed the Dockerfile to pull the most recent patch version of Python 3.10 for CI instead of pinning to a particular patch version.
- Moved from setup.py and setup.cfg to pyproject.toml for specifying package metadata. Added and tweaked package metadata for installation and PyPI.
- Started using the `build` package for creating an sdist and wheel. Added a step to the CI to run `python -m build` to generate the sdist and wheel.
- Moved the declaration of `pytest_plugins` to a top-level conftest.py file to allow for running tests with just the command `pytest`. Updated CI and the docs from `pytest hlink/tests` to just `pytest`.