Hypex

Latest version: v0.1.8

Safety actively analyzes 681812 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

0.1.5

We are excited to introduce new features in HypEx v0.1.4 that enhance both matching performance and A/B testing capabilities.

New Features

Algorithm Selection for Matching

- **Choose Your Algorithm**:
- Users can now select the algorithm used for matching by specifying the desired option during initialization.
- To prioritize speed over accuracy, simply set `algo='fast'` when initializing HypEx:

python
from hypex import Matcher

hypes = Matcher(...., algo='fast')


- This feature allows users to tailor the performance according to their specific needs, balancing between speed and precision.

Enhanced A/A Testing with Group Splitting

- **Custom Group Ratios**:
- We've added the capability to split participants into multiple groups with custom ratios, enabling more flexible and precise A/A testing scenarios.
- This feature provides greater control over test design, allowing for diverse experimental setups.

Documentation and Tutorials

- For detailed instructions and examples on how to use these new features, please refer to the [tutorials](https://github.com/sb-ai-lab/HypEx/tree/master/examples/tutorials) section in our documentation.

0.1.4

We are thrilled to announce the release of **HypEx v0.1.4**! This update brings significant improvements in matching speed, especially when dealing with large datasets.

Key Features and Improvements

- **Enhanced Matching Performance**:
- We have optimized the matching process, achieving a performance boost by an order of magnitude. This enhancement will be particularly beneficial for datasets exceeding one million entries.

- **Introduction of FAISS Algorithm**:
- For datasets larger than one million entries, we've implemented the [FAISS](https://github.com/facebookresearch/faiss) algorithm. FAISS (Facebook AI Similarity Search) provides a substantial increase in speed by using efficient nearest neighbor search techniques.

- **Trade-offs**:
- **Speed vs. Accuracy**: The new algorithm improves speed but slightly compromises on accuracy and reproducibility. This is because FAISS first performs random clustering before searching for nearest neighbors, which leads to faster results compared to the exhaustive search used in previous versions.

Benefits

- **Faster Processing**: Users with large-scale data can now experience significantly reduced computation times, enabling faster data processing and analysis.

- **Scalability**: The adoption of FAISS makes HypEx more scalable, allowing users to work with larger datasets efficiently.

Potential Considerations

- **Accuracy**: While the speed improvements are substantial, users should be aware of the trade-off in accuracy. In scenarios where precise matching is crucial, users might want to evaluate the results to ensure they meet their needs.

- **Reproducibility**: Due to the nature of the FAISS algorithm, results may vary slightly between runs. Users should consider this when reproducibility is critical.

0.1.3

We are excited to announce the release of HypEx version 0.1.3! This update brings several new features and improvements that enhance the functionality and usability of the library. Below is a detailed list of the changes in this release:

New Features

Chi-Squared Test for Categorical Variables in AA Tests
We have added support for the chi-squared test for categorical variables in AA tests. This addition allows for more comprehensive analysis of categorical data, improving the robustness of your causal inference results.

Permutation Test for Matching
Introducing the permutation test for matching problems. This feature enables you to perform non-parametric hypothesis tests, enhancing the accuracy and reliability of your matching analyses.

AB Testing on Unbalanced Samples
Our new functionality allows you to conduct AB tests on unbalanced samples. This update ensures that you can handle real-world data more effectively, providing more reliable insights even when your sample groups are not perfectly balanced.

Matching Without Replacement
We have added the option to perform matching without replacement. This method reduces bias and variance in your matched samples, leading to more accurate causal inference results.

Conclusion
We believe these updates will significantly improve your experience with HypEx, providing more powerful and flexible tools for causal inference. We look forward to your feedback and are committed to continuously enhancing the capabilities of HypEx.

---

Thank you for your continued support and contributions!

The HypEx Team

What's Changed
* [HEI-253] Feature/permutationg test validation by tikhomirovd in https://github.com/sb-ai-lab/HypEx/pull/91
* [HEI-252] Feature/unbalanced ab by tikhomirovd in https://github.com/sb-ai-lab/HypEx/pull/90
* Feature/release 013 by tikhomirovd in https://github.com/sb-ai-lab/HypEx/pull/92


**Full Changelog**: https://github.com/sb-ai-lab/HypEx/compare/v0.1.2...v0.1.3

0.1.2

We are thrilled to announce the release of HypEx 0.1.2. This release introduces significant statistical and data generation functionalities to enhance your analysis capabilities. Below are the major updates included in this release:

New Features

Statistical Analysis Enhancements
- **Chi-Squared Test in AATest:** We have added a chi-squared test to the `AATest` class, enhancing our toolkit's capabilities to handle categorical data analysis.
- **Unbalanced Groups Analysis:**
- **Non-Binomial Data Handling:** Introduced the `__mde_unbalanced_non_binomial` method, which calculates the Minimum Detectable Effect (MDE) for non-binomial data based on standard deviation and group sizes.
- **Binomial Data Handling:** Added the `__mde_unbalanced_binomial` method, enabling MDE computation for binomial data using Cohen's d and factual conversion rates.
- **Public Interface for MDE Calculation:** Launched `calc_mde_unbalanced_group`, a comprehensive method that determines MDE based on data type (binomial or non-binomial), providing a straightforward interface for users.
- **Сombine all this in one interfece**: One function for all calculation `calc_mde`

Data Generation for Simulation and Testing
- **Enhanced Dataset Generation in `dataset.py`:**
- **Medicine Data Simulation:** `gen_special_medicine_df` function now allows for the generation of synthetic datasets that simulate medical treatment scenarios.
- **Oracle and Control Variates Datasets:** Introduced `gen_oracle_df` and `gen_control_variates_df` methods to create complex synthetic datasets. These functions support extensive scenarios for testing and developing data analysis methods within the HypEx framework.

Documentation and Development
- Integrated detailed docstrings for all new methods, improving code maintainability and easing future enhancements.

This release marks a significant improvement in HypEx's capabilities, providing robust tools for both analytical and simulation tasks. These enhancements not only broaden the scope of applicable use cases but also improve the ease and accuracy of statistical tests and data handling within the HypEx environment.

For more details on using these new features, please refer to the updated documentation and example notebooks included in the release.

We thank our community for their continued support and feedback, which are invaluable to the ongoing development and refinement of HypEx.

Happy analyzing!

0.1.1

HypEx Release 0.1.1 Summary

We're excited to announce the release of HypEx 0.1.1, which includes a range of updates aimed at improving functionality, enhancing usability, and fixing known issues. Here's what's new:

New Features and Enhancements

- **Added Support for Python 3.11 and 3.12:** Ensuring HypEx remains compatible with the latest Python versions, we've tested and adjusted HypEx to work seamlessly with Python 3.11 and 3.12.

- **Enhancements to `group_col` Handling:** Improved the flexibility and accuracy of `group_col` parameter handling within HypEx, allowing for more robust operation in sorting, null-value handling, and group concatenation.

- **Introduction of `fill_gaps` Parameter:** A new feature in the Matcher class that automatically fills NaN values in categorical columns used for grouping, streamlining data preparation.

- **New `max_categories` Parameter:** This update introduces a limit to the number of categories a column can have before being excluded from conversion into dummy variables, preventing memory issues with high-cardinality columns.

- **Performance Optimization in `abn_test`:** Addressed performance issues and corrected a bug in the formula used for determining hypotheses outcomes, enhancing execution efficiency and accuracy.

- **Documentation Update - Code of Conduct:** Added a Code of Conduct to our repository to outline expectations for behavior and provide a process for handling misconduct, fostering a more inclusive and respectful community.

- **Moved** `limit_distribution` to `abn_test `
- **Removed** multitarget in `Matcher` and `validate_result` in `Matcher` due to mathematical reason. Return it back later.

Bug Fixes

- **Fixed `group_col` List Handling:** Addressed an issue where `group_col` as a list was not functioning correctly, ensuring proper operation across various use cases.

- **Speed and Hypothesis Selection in `limit_distribution`:** Optimized the function to reduce execution time and fixed a bug in hypothesis selection, ensuring reliable outcomes.

Documentation and Community

- **Enhanced Documentation:** Updated documentation to reflect new features, parameter introductions, and usage examples, making it easier for users to get started and utilize HypEx effectively.

- **Community Engagement:** Encouraged community feedback and contributions by clarifying contribution guidelines and promoting an open, collaborative environment.

This release represents a significant effort from the HypEx team to address user feedback, improve the library's functionality, and ensure it meets the community's needs. We thank our contributors for their invaluable input and look forward to continuing to develop HypEx together.

0.1.0

HypEx Release Summary

This release of HypEx introduces a range of new features, significant enhancements, and critical bug fixes aimed at improving the usability, functionality, and reliability of the framework. Below is an overview of the major changes:

New Features and Enhancements

1. **Feature Selection Integration**: Utilizing CatBoost and LightGBM for integrating feature importance algorithms, directly affecting the matching tasks within HypEx.

2. **AATest Class Enhancements**: Refactoring critical methods into the AATest class, simplifying user interaction and extending the functionality to handle direct DataFrame inputs for a more intuitive experience.

3. **TQDM Import Compatibility**: Addressed compatibility issues with older versions of tqdm, ensuring smoother operations across different environments.

4. **Validate Group Col Functionality**: Expanded the validate_group_col function to accept both strings and lists, enabling more complex data validation scenarios.

5. **Enhancements to Group Matching**: Introduced a mechanism to bypass categories preventing successful Cholesky decomposition, thereby enhancing the stability of the matching process.

6. **Automated Emissions Handling**: Automated detection and management of extreme outliers, with customization options for handling, thus improving data analysis reliability.

7. **Delta_t Attribute for Bias Quantification**: New attribute to quantify the bias contribution to ATT, offering insights into the impact of bias on the analysis results.

8. **Imbalanced Sample Size Calculation**: Added a new function for calculating necessary sample sizes for control and test groups in studies with imbalanced group sizes, supporting both binary and continuous outcomes.

Bug Fixes

1. **Fixed Cholesky Decomposition Issue**: Adjusted data processing to ensure Cholesky decomposition can always be performed, addressing failures due to non-positive definite matrices.

2. **Resolved Dataset Creation Bug**: Corrected an issue affecting the number of users in dataset creation, ensuring accuracy and reliability.

Documentation and Warnings

- Added warnings and guidelines regarding the alpha version functionalities like multitarget matching, feature selection in matching, and validation in matching.
- Enhanced documentation to include warnings about feature selection, providing a more comprehensive understanding of potential pitfalls and considerations.

Methodological Innovations

- Developed a method based on the theory of limit distributions, aimed at maintaining strict adherence to the predefined probabilities of Type I and Type II errors, thus improving the decision-making process in multiple hypothesis testing.

This release represents a significant step forward for the HypEx project, delivering robust solutions to complex data analysis challenges while maintaining high standards of accuracy and reliability.

Page 1 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.