Scikit-longitudinal

Latest version: v0.0.7

Safety actively analyzes 722491 Python packages for vulnerabilities to keep your Python projects secure.

0.0.7

New Features

- **Migration to `uv`**: Successfully transitioned from PDM to `uv` for package management, enhancing workflow efficiency and build reliability. Enhanced documentation to assist users with installation and setup using `uv`.
- **Refactored Visualisations and Tutorials**: Updated tutorials and visualisations to align with the migration to `uv`. Improved the **Quick Start Guide** by providing clearer instructions and optimising the layout to enhance user experience.
- **Enhanced Estimators and Pipelines**:
- Refactored **Lexico Gradient Boosting** for full compliance with Scikit-Learn, eliminating the previous dependency on StarBoost.
- Improved preprocessing pipelines for ARFF file management using the powerful **liac-arff** library.
- **CI/CD Updates**: Implemented enhancements to the continuous integration and deployment pipeline, resulting in more efficient builds and improved compatibility with GitHub Actions.

Enhanced

- **Documentation**: Fixed typos and inconsistencies in installation guides and tutorials to enhance clarity and improve user experience.
- **Usage Examples**: Enhanced examples through various methodologies to effectively illustrate the application of longitudinal machine learning techniques.
- **Compliance and Maintainability**: Implemented key enhancements to boost compliance and maintainability of tools and documentation.

Resolved

- Resolved minor issues in installation and documentation setups, improving usability and reliability.

<details>
<summary>Previously in <code>v0.0.4</code></summary>

This release includes a number of important changes intended to improve the library's overall usability, maintainability, and compliance. We added new estimators and enhanced certain strategies for data preparation and preprocessing. In addition, we implemented everything for PyPi publishing and Github CI to ensure the long-term viability of Scikit-Longitudinal.

🫵

0.0.4

Added

- **Documentation**: Comprehensive new documentation with Material for MKDocs. This includes a detailed tutorial on understanding vectors of waves in longitudinal datasets, a contribution guide, an FAQ section, and complete API references for all estimators, preprocessors, data preparations, and the pipeline manager.
- **Docker Installation**: Added new Docker installation process.
- **Windows Support**: Windows is now supported via Docker.
- **New Classifiers/Regressors**: Introduced Lexico Deep Forest, Lexico Gradient Boosting, and Lexico Decision Tree Regressor.
- **PyPI Availability**: Scikit-Longitudinal is now available on PyPI.
- **Continuous Integration**: Integrated unit testing, documentation, and PyPI publishing within the CI pipeline.

Improved

- **PDM Setup and Installation**: Enhanced setup and installation processes using PDM.
- **Testing Coverage**: Improved testing coverage, ensuring that nearly 90% of the library is tested.
- **Scikit-Lexicographical-Trees**: Extracted the lexicographical scikit-learn tree node splitting function into its own repository and published it to PyPI as Scikit-Lexicographical-Trees. This is now leveraged by our lexico-based estimators.
- **.env Management**: Improved management of environment variables.
- **Lexicographical Enhancements**: Integrated lexicographical enhancements of the waves vector within the variant of scikit-learn, scikit-lexicographical-trees, improving memory and time efficiency by handling algorithmic temporality directly in C++.

To-Do

- **Docstrings Alignment**: Ensure that docstrings in the codebase align with the official documentation to avoid confusion.
- **Native Windows Compatibility**: Achieve Windows compatibility without relying on Docker (requires access to a Windows machine).
- **Future Enhancements**: Ongoing improvements and new features as they are identified.
- **Documentation examples**: Add examples to the documentation to help users understand how to use the library with Jupyter notebooks.

0.0.3

This release introduces a number of significant modifications aimed at enhancing the library's overall usability, maintainability, and compliance. We have addressed everything from features group management and AutoLD compliance to the switch from Poetry to PDM for package management and cross-compatability!

Key Features 🫶

- **Features Group Missing Waves Handling**: We've introduced mechanisms to gracefully handle missing waves in features groups.
- **Readiness Descriptions**: New readiness indicators are available, providing detailed descriptions of how temporal information is handled across the library.
- **Compliance with AutoLD**: The library is now compliant with AutoLD standards, extending its interoperability.
- **Package Management Transition**: We've migrated from Poetry to PDM, enhancing our package and dependency management.
- **Docker Support**: A Linux-based Docker environment has been set up to streamline installation and deployment.
- **Platform Testing**: The library is tested on both Mac and Linux. Windows support is nearing completion.
- **Documentation**: A comprehensive version 0.0.1 of the documentation is now available on GitHub Pages.
- **Pipeline Manager**: The pipeline has been refactored into a more maintainable and flexible pipeline manager.
- **CFS Classes Refactoring**: The CFS and CFS Per Group algorithms have been separated into distinct classes for better management.

Removed or Moved 🧹

- **Irrelevant Scripts**: Scripts related to visualisations have been removed as they were not directly relevant to the library's core functionality.
- **Experiments Branch**: All experiment-related codes have been moved to a dedicated branch `Experiments`.

0.0.2

This release introduces several key improvements and features, including the implementation and validation of the three 'CFS Per Group Nested Tree' and 'LexicoRF', parallelization where possible, and a longitudinal dataset handler. Additionally, the codebase is highly documented and more than 95% of it is tested.

Key Features:

- **CFS Per Group Nested Tree and LexicoRF**: Implemented and validated these algorithms.
- **Parallelization**: Applied parallelization for performance improvement wherever possible.
- **Longitudinal Dataset Handler**: Introduced a handler for easy access to non-longitudinal features, longitudinal features group, etc.
- **Longitudinal Pipeline**: Developed a pipeline specifically for longitudinal-based algorithms, allowing feature groups to pass onto each step of the pipeline.
- **Highly Documented Code**: Ensured the codebase is well-documented to facilitate understanding and maintenance.
- **Extensive Testing**: More than 95% of the codebase is tested.
- **Hooks and More Tools**: Added hooks and other tools for long-term project usage.
- **Improved CFS Per Group Algorithm**: Introduced a version two of the algorithm, based on the paper's concept level.
- **Updated README**: The README has been updated with new information.

0.0.1

This release marks the initial setup of the Poetry Python project for Scikit-Longitudinal with one first estimator, featuring robust type-checking and an array of linting tools, including pylint, flake8, pre-commit, black, and isort.

Key Features:

- **Setup project with one first estimator**
- **Highly typed Python code** to ensure code quality and maintainability.
- **Comprehensive linting tools** (pylint, flake8, pre-commit, black, isort) integrated into the project to enforce coding standards and consistency.

Estimators:

- **Correlation-based Feature Selection (CFS) algorithm**: This release includes a [refined version of an open-source CFS algorithm](https://github.com/ZixiaoShen/Correlation-based-Feature-Selection/tree/45d27e8b7f1c6c5661fc0fe134faa02ee1c642a4), featuring improved typing, testing, runtime optimisation and brand new search algorithms.
- **CFS per Group for Longitudinal Data**: This release also introduces a Python implementation of a [previously Java-based open-source CFS per Group algorithm](https://github.com/mastervii/CSF_2-phase-variant), tailored for longitudinal data. The Python implementation now is enhanced with parallelism for better performance, testing and highly typing.

We hope you enjoy using this first release and look forward to your feedback and contributions! Cheers!

Releases

Has known vulnerabilities