Fairreckitlib

Latest version: v1.0.0

Safety actively analyzes 641872 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

1.0.0

Library

This is a high-level summary of this library’s implemented functionality.
For actual code documentation, check out our [API Pages](https://theminefreak23.github.io/fairreckitlib).
For details on how to extend this library, see our [wiki guides](https://github.com/TheMinefreak23/fairreckitlib/wiki).

Recommender System
Top level API intended for use by applications. The system owns several (grouped) factories for available datasets/data modifiers/algorithms/metrics which are modular and extendable. Computations are done multithreaded and supports functionality for:
- Aborting an active computation
- Running a new experiment, both from code as from a configuration file
- Validating an existing experiment for an additional number of runs
- JSON compatible availability of the factories

Events
Centralised event dispatcher supporting multiple event listeners. Numerous events are available and categorised under:
- Configuration parsing
- Errors
- IO
- Data Pipeline
- Model Pipeline
- Evaluation Pipeline
- Experiment Pipeline

Data
Abstract dataset and registry to define available sets.

Processors
Data (pre)processors for various movie and music datasets into a standard format:
- ML-100K (user-movie-rating)
- ML-25M (user-movie-rating)
- LFM-360K (user-artist-count)
- LFM-1B (user-artist-count)
- LFM-2B (user-artist-count, user-track-count)

All the dataset have at least one matrix available, whereas the latter two datasets also have the possibility to generate additional matrices (e.g. user-album-count).

Filtering
Abstract data filters that are dynamically determined for each available dataset-matrix pair.
Applying filters occurs in multiple passes, where each pass is a list of sequentially applied filters. All passes are merged, making the creation of custom subgroups endlessly.

Rating conversion
Abstract matrix rating converter to modify the existing ratings of a matrix. These are also dynamically determined for each available dataset-matrix pair.

Splitting
Abstract data splitter to divide a dataset into a train and test set.
Separate factory to define available splitting methods.
- Random splitting
- Temporal splitting

Data Pipeline
- Dataset matrix loading
- Dataset matrix filtering
- Dataset matrix rating conversion
- Dataset matrix train-test splitting

Model algorithm
Base algorithm interface, divided into predictors and recommenders.
Separate factories to define available algorithms for various APIs.

Implicit recommenders
- AlternatingLeastSquares
- BayesianPersonalizedRankings
- LogisticMatrixFactorization
LensKit predictors/recommenders
- Biased Matrix Factorization
- Implicit Matrix Factorization
- ItemItem KNN
- PopScore
- Random (recommender only)
- UserUser KNN
Surprise predictors/recommenders
- BaselineOnly (ALS and SGD)
- CoClustering
- KNNBaseline (ALS and SGD)
- KNNBasic
- KNNWithMeans
- KNNWithZScore
- NMF (Non-negative Matrix Factorization)
- NormalPredictor
- SlopeOne
- SVD (Funk)
- SVDpp

Model Pipeline
- Algorithm loading with parameters
- Algorithm training with a train set
- Algorithm testing with a test set and saving the results to a tsv file
- User item predictions for known ratings
- User topK item recommendations
- Reconstructing the original ratings from the train- and test-set that was used.

Evaluation
Base metric interface divided into several metric categories: accuracy, rating and coverage.
Separate (category) factories to define available metrics.

LensKit accuracy metrics
- NDCGK
- PK
- RK
- MRR
LensKit rating metrics
- RMSE
- MAE
Rexmex coverage metrics
- Item coverage
- User coverage
Rexmex rating metrics
- MSE
- MAPE

Evaluation Pipeline
- Metric loading with parameters
- Evaluation sets loading
- Evaluation sets filtering
- Metric evaluates the computed item results and saving it to a JSON file

Experiment
For both prediction and recommendation experiment, it supports:
- One or more datasets
- One or more algorithms
- Zero or more metrics

Experiment configuration
Experiments can be specified using two types of configurations, which are interchangeable:
1. Python Dataclasses
2. YML file

Configurations are parsed to handle erroneous input and where possible to use defaults.

Experiment pipeline
Connects the data, model and evaluation pipelines to conduct an experiment.

Experiment threading
The experiment pipeline(s) can be run on a separate thread, making it possible to abort the computation when requested by the user.

0.5.1

Major changes
- Improved parsing code
- Improved memory usage when loading data matrices
Minor changes
- More documentation
- Updated filters

0.5.0

Major changes
- Rating converters are now defined per dataset-matrix pair (instead of generalised)
- Added filter factories per dataset-matrix pair
- Evaluation pipeline error handling
Minor changes
- Evaluation pipeline minor fixes
- Proper documentation of the evaluation pipeline
- Codebase now entirely conforms to PEP257 convention
- Minor bug fixes

0.4.2

Major changes
- Evaluation pipeline refactor
- Events refactor

Minor changes
- Bundled IO functionality
- More documentation
- More unit tests

0.4.1

- Removed accidental src imports

0.4.0

Major changes
- Dataset support for multiple user-item matrices
- Dataset support for multiple tables with the same primary key
- Dataset support for tables that are/can be stored compressed on the disk
- Dataset support for event tables that can be used to generate additional user-item matrices
- Generic add columns function for any of the tables with a one-to-one relation with the primary key
- Evaluation pipeline produces metric computation results again
Minor changes
- Started refactoring evaluation pipeline

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.