Library
This is a high-level summary of this library’s implemented functionality.
For actual code documentation, check out our [API Pages](https://theminefreak23.github.io/fairreckitlib).
For details on how to extend this library, see our [wiki guides](https://github.com/TheMinefreak23/fairreckitlib/wiki).
Recommender System
Top level API intended for use by applications. The system owns several (grouped) factories for available datasets/data modifiers/algorithms/metrics which are modular and extendable. Computations are done multithreaded and supports functionality for:
- Aborting an active computation
- Running a new experiment, both from code as from a configuration file
- Validating an existing experiment for an additional number of runs
- JSON compatible availability of the factories
Events
Centralised event dispatcher supporting multiple event listeners. Numerous events are available and categorised under:
- Configuration parsing
- Errors
- IO
- Data Pipeline
- Model Pipeline
- Evaluation Pipeline
- Experiment Pipeline
Data
Abstract dataset and registry to define available sets.
Processors
Data (pre)processors for various movie and music datasets into a standard format:
- ML-100K (user-movie-rating)
- ML-25M (user-movie-rating)
- LFM-360K (user-artist-count)
- LFM-1B (user-artist-count)
- LFM-2B (user-artist-count, user-track-count)
All the dataset have at least one matrix available, whereas the latter two datasets also have the possibility to generate additional matrices (e.g. user-album-count).
Filtering
Abstract data filters that are dynamically determined for each available dataset-matrix pair.
Applying filters occurs in multiple passes, where each pass is a list of sequentially applied filters. All passes are merged, making the creation of custom subgroups endlessly.
Rating conversion
Abstract matrix rating converter to modify the existing ratings of a matrix. These are also dynamically determined for each available dataset-matrix pair.
Splitting
Abstract data splitter to divide a dataset into a train and test set.
Separate factory to define available splitting methods.
- Random splitting
- Temporal splitting
Data Pipeline
- Dataset matrix loading
- Dataset matrix filtering
- Dataset matrix rating conversion
- Dataset matrix train-test splitting
Model algorithm
Base algorithm interface, divided into predictors and recommenders.
Separate factories to define available algorithms for various APIs.
Implicit recommenders
- AlternatingLeastSquares
- BayesianPersonalizedRankings
- LogisticMatrixFactorization
LensKit predictors/recommenders
- Biased Matrix Factorization
- Implicit Matrix Factorization
- ItemItem KNN
- PopScore
- Random (recommender only)
- UserUser KNN
Surprise predictors/recommenders
- BaselineOnly (ALS and SGD)
- CoClustering
- KNNBaseline (ALS and SGD)
- KNNBasic
- KNNWithMeans
- KNNWithZScore
- NMF (Non-negative Matrix Factorization)
- NormalPredictor
- SlopeOne
- SVD (Funk)
- SVDpp
Model Pipeline
- Algorithm loading with parameters
- Algorithm training with a train set
- Algorithm testing with a test set and saving the results to a tsv file
- User item predictions for known ratings
- User topK item recommendations
- Reconstructing the original ratings from the train- and test-set that was used.
Evaluation
Base metric interface divided into several metric categories: accuracy, rating and coverage.
Separate (category) factories to define available metrics.
LensKit accuracy metrics
- NDCGK
- PK
- RK
- MRR
LensKit rating metrics
- RMSE
- MAE
Rexmex coverage metrics
- Item coverage
- User coverage
Rexmex rating metrics
- MSE
- MAPE
Evaluation Pipeline
- Metric loading with parameters
- Evaluation sets loading
- Evaluation sets filtering
- Metric evaluates the computed item results and saving it to a JSON file
Experiment
For both prediction and recommendation experiment, it supports:
- One or more datasets
- One or more algorithms
- Zero or more metrics
Experiment configuration
Experiments can be specified using two types of configurations, which are interchangeable:
1. Python Dataclasses
2. YML file
Configurations are parsed to handle erroneous input and where possible to use defaults.
Experiment pipeline
Connects the data, model and evaluation pipelines to conduct an experiment.
Experiment threading
The experiment pipeline(s) can be run on a separate thread, making it possible to abort the computation when requested by the user.