Bdi-kit

Latest version: v0.3.0

Safety actively analyzes 666166 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

2.0.0

List all tags to verify the tag was created
git tag -l


9. Push the tag to the remote repository. This will trigger the CI tests, and
build and publish the package to `https://pypi.org/project/bdi-kit/`.
bash
git push --tags


10. Head to GitHub Actions page (https://github.com/VIDA-NYU/bdi-kit/actions), locate the workflow run ("Publish to PyPI 🐍") triggered for the new release tag and approve its execution. When the workflow finishes, the library should be available at https://pypi.org/project/bdi-kit/.

11. Head to https://github.com/VIDA-NYU/bdi-kit/releases/ and update the release
with the CHANGELOG for the released version.

12. Switch to `devel` branch and merge the release (to make sure `devel` is always on top of `main`). If you didn't have to make any changes to `main` this will do nothing.
bash
git checkout devel
git merge main

13. Change the version in `bdikit/__init__.py` appending `.dev0` to the future version, e.g. `2.1.0.dev0`. Add a new empty version on top of `CHANGELOG.md`, e.g. `2.1.0.dev0 (yyyy-mm-dd)`.

14. Commit with message `Bump version for development`.
bash
git add CHANGELOG.md bdikit/__init__.py
git commit -m "Bump version for development"
git push


Change Log
==========

0.4.0.dev0

------------------------

0.3.0

------------------
Below is a detailed list of important changes included in this release.

New Features and Improvements:
- feat: Consistently use np.nan for missing values
- feat: Add view_value_matches() method
- feat: Add new schema matcher based on value matching
- feat: Support passing method arguments to top_matches()
- feat: Support euclidean distance in CLTopkColumnMatcher

Documentation:
- docs: Add documentation for schema mapping methods
- docs: Add example for view_value_matches() method

0.2.0

------------------

We are pleased to announce the release of bdikit version 0.2.0.
This version includes several new features, including a new API,
a new column-matching model based on contrastive learning,
several new matching algorithm implementations, new visualizations
to help with column-matching, and a new website with our API
documentation at https://bdi-kit.readthedocs.io/. Additionally,
we added lots of improvements to our development infrastructure,
including several unit tests, automated tests in our CI (continuous
integration) server, and automated release publication to PyPI.

Below is a detailed list of important changes included in this release.

New Features and Improvements:
- feat: Add automatic model download (34)
- feat: Adding AutoFuzzyJoin Algorithm for value mapping
- test: Add tests to TFIDF and EditDistance-based value matching
- feat: Format output text for unmatched values
- feat: Adding TwoPhase ColumnMatch algorithm based on CTLearning
- feat: Change column matching API to be stateless
- feat(api): Add new function bdi.match_columns()
- feat(api): Add new function bdi.top_matches()
- feat(api): Added bdi.materialize_mapping() and basic value mappers
- feat(api): add match_values() and preview_value_mappings()
- test(api): Add end-to-end API integration test
- feat(api): Add bdi.preview_domains()
- feat(api): Support an object as a method in match_columns()
- feat(api): Make API inputs more compatible
- feat: Change default algorithm for column mapping
- feat: Make EditAlgorithm to return values between 0 and 1
- feat: Release new constrastive learning model 'bdi-cl-v0.2' and make it default
- feat: Create a mapper from the output of preview_value_mappings
- feat: fit schema matching visualization tool to the new API
- feat: Update preview_domain() to include value names and descriptions (68)
- feat: Allow configuring device (gpu/cpu) via env vars
- feat: Allow sending arguments for the matching algorithms (75)
- feat: Adding embedding cache for gdc case (76)

Infrastructure:
- Setup CI infrastructure for automated tests
- Add Python 3.11 to CI build script
- Setup GH action to check code format using black formatter
- Automatic code formatting using black
- Add 'format' and 'lint' targets to the Makefile
- infra: Add linter (pyright) config file
- feat: Setup PyPI release using GitHub Actions
- chore: Remove modules, transformation, data_ingestion, mapping_recommendation
- chore: Remove scope_reducing module
- chore: Remove empty module directories

Bug Fixes:
- Fix setup.py to read version correctly
- Temporarily use scipy<1.13 to avoid import error of triu package
- Temporarily use matplotlib<3.9 to avoid PolyFuzz error
- Fix conflicts and relax dependency versions
- TFIDFAlgorithm doesn't work for short values (57)
- do not allow panel version 1.4.3 (fixes issue 40)
- Support repeated source columns in match_values()

Visualizations:
- Add accept and reject button and interaction (39)
- Remove plot_reduce_scope from APIManager's reduce_scope
- Add ScopeReducerExplorer to column_and_value_mapping notebook
- Remove tabulate
- Use display() to show the visualization

Refactoring:
- Refactoring of column mappers to remove code duplication
- Moving model_name,top_k to construtor parameters for better class reuse
- Extract TopkColumnMatcher from ContrastiveLearningAPI
- Rename match_columns() to match_schema() (66)
- Renamed value matcher classes to use suffix 'ValueMatcher'
- Merge preview_value_mappings() and match_values()
- Make match_values() return a list of DataFrames
- Reorder API functions
- Rename update_mappings() to merge_mappings()
- Rename variables for better API naming consistency

Documentation:
- Add readthedocs configurations
- Update installation instructions
- Add API Reference documentation page for bdikit module functions
- Fix API documentation
- Updated README.md with code formatting instructions
- Rename notebook to doc_gdc_harmonization.ipynb
- Fix heading levels in getting-started.ipynb
- Fix typos and minor improvement in getting-started.ipynb
- Reorganize examples directory for documentation (71)
- Minor formatting changes
- Improve API documentation and fix typos
- Improve documentation and minor typos
- Clarify description of MappingSpecLike

0.1.0

-------------------

* Add a scope reducer for GDC.
* Add initial column mapping algorithms.
* Add initial value mapping algorithms.

Links

Releases

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.