Major refactor of OpenPredict
1. OpenPredict now uses `dvc`[ (data version control)](https://dvc.org/) and [DagsHub](https://dagshub.com/) (a platform to publish the data) to handle all data files required to train the models and run the predictions that are all in the `data/` folder at the root of the repository (instead of having half of them committed to git, and the other half to be downloaded from some obscure servers, now every single CSV input and model output at stored in the `data/` folder).
`dvc`/DagsHub works and looks a lot like `git`/GitHub, but specialized for large data files (and it can also be used to store metadata about your runs). The repo size limit for open source projects on DagsHub is 10G
You can find the data used for OpenPredict (prediction + similarity + evidence path + drkg model) at https://dagshub.com/vemonet/translator-openpredict
2. There is now a **decorator `trapi_predict`** to mark a functions that return predictions that can be integrated automatically in a TRAPI query. It allows the dev to specify which relations this prediction function can resolve in a TRAPI query. So the dev just need to insure that his prediction functions take the input we expect, and returns the predictions in the expected format (cf. the code example below, note that it is vastly inspired from ElasticSearch and BioThings return formats).
Then the predictions generated by this function can be automatically integrated to our TRAPI API, and a simple GET endpoint to query the prediction individually is also automatically generated
python
from openpredict import trapi_predict, PredictOptions, PredictOutput
trapi_predict(path='/predict',
name="Get predicted targets for a given entity",
description="Return the predicted targets for a given entity: drug (DrugBank ID) or disease (OMIM ID), with confidence scores.",
relations=[
{
'subject': 'biolink:Drug',
'predicate': 'biolink:treats',
'object': 'biolink:Disease',
},
{
'subject': 'biolink:Disease',
'predicate': 'biolink:treated_by',
'object': 'biolink:Drug',
},
]
)
def get_predictions(
input_id: str, options: PredictOptions
) -> PredictOutput:
Add the code the load the model and get predictions here
predictions = {
"hits": [
{
"id": "DB00001",
"type": "biolink:Drug",
"score": 0.12345,
"label": "Leipirudin",
}
],
"count": 1,
}
return predictions
3. When someone wants to add a new prediction model to the Translator OpenPredict API they can either create a new folder under `src/` in the existing translator-openpredict repo, and add all the python files they need to train and run the prediction (and use the decorator to annotate the prediction function).
Or do it in a separate repository published to GitHub, with data stored using `dvc` . So we can easily import the code and data required to run the prediction from the OpenPredict API. There is a template repository to help people get started with the recommended architecture: https://github.com/MaastrichtU-IDS/cookiecutter-openpredict-api
bash
pip install cookiecutter
cookiecutter https://github.com/MaastrichtU-IDS/cookiecutter-openpredict-api
4. Now use hatch instead of poetry for build process
**Full Changelog**: https://github.com/MaastrichtU-IDS/translator-openpredict/compare/v0.0.8...v0.1.0