Opendataval

Latest version: v1.3.0

Safety actively analyzes 622435 Python packages for vulnerabilities to keep your Python projects secure.

2305.00054

Commits from source branch
- d83cd00: Added PyPi support and prerelease support
- 770f10d: Added two new tochvision datasets and modified
- f5c3384: Updated docs website to match backend
- 1870a5e: Updated Ruff version to version 0.0.275 and
- e54a461: edit language_version according to https://github.com/pre-commit/pre-commit/issues/1375
- 7277403: initiate examples branch Pull request [110]
- 244d93d: add examples: adult and diabetes
- 67eec7f: remove demo folder + demo-ipynb
- de27357: remove demo folder + demo-ipynb
- 5e3cda9: add openml regression datasets
- 8400c5a: add nlp (bbc-embeddings) examples
- a23a6d3: add cifar10-embeddings and regression examples
- 93ae0a5: edit examples/readme.md
- f4bee02: Update coverage badge on Readme
- 3817c01: Updated data endpoint after updates to the backend
- be0457a: Added export dataset method on fetcher
- 9348e7c: Added LAVA data evaluator (kevinfjiang)
- 389f985: Added Volume based data evaluator implementation
- cd54a95: Update coverage badge on Readme
- df0f3d5: Merge of two commits:
- cec13ac: Updated documentation and added new evaluators to test/
- f4278a9: Updated experiment functions (kevinfjiang)
- b449460: Update coverage badge
- c40a92a: update examples and fix errors
- e6f1043: Updated version number for new updates since rebuttal
- eaf8cc3: Update coverage badge

**Full Changelog**: https://github.com/opendataval/opendataval/compare/v1.0.0...v1.1.0

1703.04730

1. `InfluenceFunctionEval` (old influence function) is renamed to [`InfluenceSubsample`](opendataval/dataval/influence/infsub.py)

[`grad.py`](opendataval/model/grad.py)
`grad.py` offers a gradient Mixin which provides a first-order gradient computation method. Gradients have been added for torch MLPs and torch logistic regression. This is used by `InfluenceFunction`

API changes
Gradient models now have a `.grad(x_data, y_data)` which returns the gradients

[`margcontrib/sampler.py`](opendataval/dataval/margcontrib/sampler.py)
`sampler.py` offers different methods of sampling and caching for marginal contribution samplers. Included are TMC Shapley, TMC shapley with GR-statistic, and the MC Shapely.

API changes
Semi-value evaluators can take a sampler as the first argument. Ex: `DataShapley(GrTMCSampler(mc_epochs=1000))`. The previous API is consistent. For example, this is equivalent to `DataShapley(mc_epochs=1000)` because the default sampler is `GrTMCSampler`.

[DataEvaluator Mixin changes](opendataval/dataval/api.pyL201)
To remove bias to model-based and model-less evaluators, a `ModelMixin` was added for `DataEvaluator` that require a model. This removes some of the original functionality of `DataEvaluator` that was specific to Data Evaluators with models.

ExperimentMediator
Minor api changes such as being able to raise an exception and passing in a function as well as a string for the evaluation metric.

[`exper_methods.py`](opendataval/experiment/exper_methods.pyL72)
experiment methods that required the data evaluator to evaluate the model performance now take a function that can evaluate performance. Non-breaking change if you use the `ExperimentMediator` breaking as now you pass in a function to `metric` instead of a string.

Commits
- 2f477b9: fix errors in shap.py
- 53ae2fb: remove unnecessary file
- 21778cd: update examples
- d3fbbe9: update readme
- 0a80e7c: Updated formatting
- e515e1b: add metrics
- bf78c4d: fix nlp error
- 659c680: Added torcheval as a dependency for more comparison metrics
- f8b4eb5: Added filter kwargs argument for ExperMediator,
- d13ae75: Removed include_train argument from expermediator
- 87293aa: Added a ModelMixin, easily identifying DataEvaluators with a model
- c049aa5: Added samplerse to opendataval!
- 86b7a9f: Added ReprMixin, to be able to print objects
- 577aab1: Refactored sampler API to be simpler
- 17f32a3: Added Class-Wise Shapley to opendataval
- db6d7b6: Updated docs
- 4e808da: Updated ExperimentMediator to allow raise Exception
- 06a4d00: Refactored experiment_methods to take a callable
- 929a987: Refactored Model API to have a Generic torch mixin
- 1b2adae: Renamed InfluenceFunctionEval ->InfluenceSubsample
- f431b5b: Added opendataval/model/grad.py for gradients
- 0ff21fe: Implemented Koh and Liang influence function evaluator
- df39585: Update coverage badge
- 80372d7: Updated version number

**Full Changelog**: https://github.com/opendataval/opendataval/compare/v1.1.0...v1.2.1

1.3.0

Changes

Dependency management
In 4 we realized there were issues in the dependency management, especially with respect to google colab. We've updated the dependency system to be stricter and avoid as many dependencies that conflict with google colab

[FolderDataset](https://github.com/opendataval/opendataval/blob/004f7399a67d086c900b8ae40e2df49ba38a7b12/opendataval/dataloader/util.py#L116)
In 3 we realized that we previously used a bandage solution to limit the number of vectors we load into memory. Now we've created a dataset that reads and writes batches of tensors to disk.

[NeurIPS](https://nips.cc/Conferences/2023)
We've added references to our paper in NeurIPS 2023. We hope to see you there!
[Poster](https://neurips.cc/virtual/2023/poster/73521) | [Paper](https://openreview.net/forum?id=eEK99egXeB)

What's Changed
* Added commits from private repo by kevinfjiang in https://github.com/opendataval/opendataval/pull/1
* Updated dependencies + Documentation by kevinfjiang in https://github.com/opendataval/opendataval/pull/2
* Fixes dependencies in OpenDataVal allowing installationg with Google colab by kevinfjiang in https://github.com/opendataval/opendataval/pull/5

**Full Changelog**: https://github.com/opendataval/opendataval/compare/v1.2.1...v1.3.0

1.2.1

Changes

Data evaluators:
1. Schoch's [Class-wise shapley](https://arxiv.org/abs/2211.06800) - Class wise shapley has been implemented similarly to `valda`. However, there is no stratified sampling and instead trains on the entire out-class

1.1.0

Changes

New DataEvaluators
1. [RVS](https://proceedings.neurips.cc/paper/2021/file/59a3adea76fadcb6dd9e54c96fc155d1-Paper.pdf)

1.0.0

Changes

`opendataval` is a Python package to build, compare, and evaluate your data evaluator. Included are a large number of datasets, state of the art data evaluation algorithms, prediction models, and comparison metrics to use and tools to extend the framework.

Installation
sh
pip install opendataval

As a library
python
import opendataval

As a CLI
python
opendataval --help

Documentation
Check out the documentation website [here](https://opendataval.github.io/).

How to help
Feel free to contribute your own data evaluators, models, datasets etc via a pull-request. Additionally feel free to complete one of our challenges on our [website](https://opendataval.github.io/leaderboards.html). If you notice any bugs or suggestions, please open an issue on the [issues tab](https://github.com/opendataval/opendataval/issues). If you like what you see, please leave us a 🌟!

Current data evaluators
For a full list of our data evaluators please see the following [link](https://opendataval.github.io/opendataval.dataval.html#catalog). The catalog is as up-to-date as the documentation.

**Full Changelog**: https://github.com/opendataval/opendataval/commits/v1.0.0

Releases

Has known vulnerabilities