New features:
- Dynamic datasets: use any dataset formatted correctly and apply the metrics of a task "family" (QA, VQA, Report Comparison, Image Classification, and NLI). This feature adds more flexibility to MultiMedEval.
- Added Diff-VQA [[Paper](https://dl.acm.org/doi/abs/10.1145/3580305.3599819)] to the list of supported tasks.
- Updated RadCliQ to reflect more closely the results in the [[Paper](https://www.cell.com/patterns/pdf/S2666-3899(23)00157-5.pdf)]
In addition to the new features, we added a suite of unit tests and corrected some bugs.