Main Changes
* Continued with major improvements to the documentation including [a new code examples section ](https://unitxt.readthedocs.io/en/latest/docs/examples.html) with standalone python code that shows how to perform evaluation, add new datasets, compare formats, use LLM as judges , and more. Cards for datasets from huggingface have detailed [descriptions](https://unitxt.readthedocs.io/en/latest/catalog/catalog.cards.sst2.html). New documentation of [RAG tasks and metrics](https://unitxt.readthedocs.io/en/latest/docs/rag_support.html).
* `load_dataset` can now load cards defined in a python file (and not only in the catalog). See [example](https://github.com/IBM/unitxt/blob/57957fc0e2303cb9a4389a15a8972dfd0ed8bbce/examples/standalone_qa_evaluation.py#L47).
* The evaluation results returned from `evaluate` now include two fields `predictions` and `processed_predictions`. See [example](https://github.com/IBM/unitxt/blob/57957fc0e2303cb9a4389a15a8972dfd0ed8bbce/examples/standalone_qa_evaluation.py#L75).
* The fields can have defaults, so if they are not specified in the card, they get a default value. For example, multi-class classification has `text` as the default `text_type`. See [example](https://unitxt.readthedocs.io/en/latest/catalog/catalog.tasks.classification.multi_class.html).
Non backward compatible changes
**You need to recreate the any cards/metrics you added by running prepare/*/*.py file. You can create all cards simply by running python utils/prepare_all_artifacts.py . This will avoid the __type__ error.**
**The AddFields operator was renamed Set and CopyFields operator was renamed Copy. Note previous code should continue to work, but we renamed all existing code in the unitxt and fm-eval repos.**
* Change Artifact.type to Artifact.__type__ by elronbandel in https://github.com/IBM/unitxt/pull/933
* change CopyFields operators name to Copy by duckling69 in https://github.com/IBM/unitxt/pull/876
* Rename AddFields to Set, a name that represent its role better and concisely by elronbandel in https://github.com/IBM/unitxt/pull/903
New Features
* Allow eager execution by elronbandel in https://github.com/IBM/unitxt/pull/888
* Add view option for Task definitions in UI explorer. by yoavkatz in https://github.com/IBM/unitxt/pull/891
* Add input type checking in LoadFromDictionary by yoavkatz in https://github.com/IBM/unitxt/pull/900
* Add TokensSlice operator by elronbandel in https://github.com/IBM/unitxt/pull/902
* Make some logs critical by elronbandel in https://github.com/IBM/unitxt/pull/973
* Add LogProbInferenceEngines API and implement for OpenAI by lilacheden in https://github.com/IBM/unitxt/pull/909
* Added support for ibm-watsonx-ai inference by pawelknes in https://github.com/IBM/unitxt/pull/961
* load_dataset supports loading cards not present in local catalog by pawelknes in https://github.com/IBM/unitxt/pull/929
* Added defaults to tasks by pawelknes in https://github.com/IBM/unitxt/pull/921
* Add raw predictions and references to results by yoavkatz in https://github.com/IBM/unitxt/pull/934
* Allow add-hoc metrics and template (and Add first version of standalone example of dataset with LLM as a judge ) by eladven in https://github.com/IBM/unitxt/pull/922
* Add infer() function for end to end inference pipeline by elronbandel in https://github.com/IBM/unitxt/pull/952
Bug Fixes
* LLMaaJ implementation of MLCommons' simple-safety-tests by bnayahu in https://github.com/IBM/unitxt/pull/873
* Update gradio version on website by elronbandel in https://github.com/IBM/unitxt/pull/896
* Improve demo by elronbandel in https://github.com/IBM/unitxt/pull/898
* Fix demo and organize files by elronbandel in https://github.com/IBM/unitxt/pull/897
* Make sacrebleu robust by yoavkatz in https://github.com/IBM/unitxt/pull/892
* Fix huggingface assets to have versions and up to date readme by elronbandel in https://github.com/IBM/unitxt/pull/895
* fix(cos loader): account for slashes in cos file name by jezekra1 in https://github.com/IBM/unitxt/pull/904
* llama3 instruct and chat system prompts by oktie in https://github.com/IBM/unitxt/pull/950
* Added trust_remote_code to HF dataset query operations by yoavkatz in https://github.com/IBM/unitxt/pull/911
Documentation
* Update llm_as_judge.rst by yoavkatz in https://github.com/IBM/unitxt/pull/970
* Michal Jacovi's completed manual review of the card descriptions by dafnapension in https://github.com/IBM/unitxt/pull/883
* In card preparers, generate the tags with "singletons" rather than values paired with True by dafnapension in https://github.com/IBM/unitxt/pull/874
* Improved documentation by yoavkatz in https://github.com/IBM/unitxt/pull/886
* Update glossary.rst by yoavkatz in https://github.com/IBM/unitxt/pull/899
* Add example section to documentation by yoavkatz in https://github.com/IBM/unitxt/pull/917
* Added example of open qa using catalog by yoavkatz in https://github.com/IBM/unitxt/pull/919
* Update example intro and simplified WNLI cards by yoavkatz in https://github.com/IBM/unitxt/pull/923
* Update adding_metric.rst by yoavkatz in https://github.com/IBM/unitxt/pull/955
* RAG documentation by yoavkatz in https://github.com/IBM/unitxt/pull/928
* docs: update adding_dataset.rst by eltociear in https://github.com/IBM/unitxt/pull/927
* prepare for __description__= that is different from those embedded automtically by dafnapension in https://github.com/IBM/unitxt/pull/937
* Add simple LLM as a judge example, of using it without installaiotn by eladven in https://github.com/IBM/unitxt/pull/968
* Add example of using LLM as a judge for summarization dataset. by eladven in https://github.com/IBM/unitxt/pull/965
* Improve operators documentation by elronbandel in https://github.com/IBM/unitxt/pull/942
New Assets
* Add numeric nlg dataset by ShirApp in https://github.com/IBM/unitxt/pull/882
* Add to_list_by_hyphen_space processor by marukaz in https://github.com/IBM/unitxt/pull/872
* Added tags and descriptions to safety cards by bnayahu in https://github.com/IBM/unitxt/pull/887
* Add Mt-Bench datasets + add operators by OfirArviv in https://github.com/IBM/unitxt/pull/870
* Touch up numeric nlg by elronbandel in https://github.com/IBM/unitxt/pull/889
* split train to train and validation sets in billsum by alonh in https://github.com/IBM/unitxt/pull/901
* modified wikitq, tab_fact taskcards by ShirApp in https://github.com/IBM/unitxt/pull/963
Implementation of TruthfulQA by bnayahu in https://github.com/IBM/unitxt/pull/931
* Add bluebench cards by perlitz in https://github.com/IBM/unitxt/pull/918
* Add LlamaIndex faithfulness metric by arielge in https://github.com/IBM/unitxt/pull/971
* Expanded template support for safety cards by bnayahu in https://github.com/IBM/unitxt/pull/943
Testing and CI/CD
* Add end to end realistic test to fusion by elronbandel in https://github.com/IBM/unitxt/pull/940
* Moved test_examples to run the actual examples by yoavkatz in https://github.com/IBM/unitxt/pull/913
* Use uv for installing requirements in actions by elronbandel in https://github.com/IBM/unitxt/pull/960
* Add ability to print_dict to print selected fields by yoavkatz in https://github.com/IBM/unitxt/pull/947
* Get rid of pkg_resources dependency by elronbandel in https://github.com/IBM/unitxt/pull/932
* adapt filtering lambda to datasets 2.20 by dafnapension in https://github.com/IBM/unitxt/pull/930
* Increase preparation log to error. by elronbandel in https://github.com/IBM/unitxt/pull/959
New Contributors
* ShirApp made their first contribution in https://github.com/IBM/unitxt/pull/882
* oktie made their first contribution in https://github.com/IBM/unitxt/pull/950
**Full Changelog**: https://github.com/IBM/unitxt/compare/1.10.0...1.10.1