HypEx Release Summary
This release of HypEx introduces a range of new features, significant enhancements, and critical bug fixes aimed at improving the usability, functionality, and reliability of the framework. Below is an overview of the major changes:
New Features and Enhancements
1. **Feature Selection Integration**: Utilizing CatBoost and LightGBM for integrating feature importance algorithms, directly affecting the matching tasks within HypEx.
2. **AATest Class Enhancements**: Refactoring critical methods into the AATest class, simplifying user interaction and extending the functionality to handle direct DataFrame inputs for a more intuitive experience.
3. **TQDM Import Compatibility**: Addressed compatibility issues with older versions of tqdm, ensuring smoother operations across different environments.
4. **Validate Group Col Functionality**: Expanded the validate_group_col function to accept both strings and lists, enabling more complex data validation scenarios.
5. **Enhancements to Group Matching**: Introduced a mechanism to bypass categories preventing successful Cholesky decomposition, thereby enhancing the stability of the matching process.
6. **Automated Emissions Handling**: Automated detection and management of extreme outliers, with customization options for handling, thus improving data analysis reliability.
7. **Delta_t Attribute for Bias Quantification**: New attribute to quantify the bias contribution to ATT, offering insights into the impact of bias on the analysis results.
8. **Imbalanced Sample Size Calculation**: Added a new function for calculating necessary sample sizes for control and test groups in studies with imbalanced group sizes, supporting both binary and continuous outcomes.
Bug Fixes
1. **Fixed Cholesky Decomposition Issue**: Adjusted data processing to ensure Cholesky decomposition can always be performed, addressing failures due to non-positive definite matrices.
2. **Resolved Dataset Creation Bug**: Corrected an issue affecting the number of users in dataset creation, ensuring accuracy and reliability.
Documentation and Warnings
- Added warnings and guidelines regarding the alpha version functionalities like multitarget matching, feature selection in matching, and validation in matching.
- Enhanced documentation to include warnings about feature selection, providing a more comprehensive understanding of potential pitfalls and considerations.
Methodological Innovations
- Developed a method based on the theory of limit distributions, aimed at maintaining strict adherence to the predefined probabilities of Type I and Type II errors, thus improving the decision-making process in multiple hypothesis testing.
This release represents a significant step forward for the HypEx project, delivering robust solutions to complex data analysis challenges while maintaining high standards of accuracy and reliability.