Unitxt

Latest version: v1.15.6

Safety actively analyzes 681866 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 8

1.12.3

Main changes
* New option to use multiple templates and/or num_demos in single dataset recipe. Unitxt will randomly sample from the provided templates and possible number of demos for each instance.
See example : https://github.com/IBM/unitxt/blob/main/examples/evaluate_different_templates_num_demos.py

* A warning is now generated when a metric generate a score with the same name as that of another metric and overwrites it
See more details on how to deal with conflicting metric names in https://www.unitxt.ai/en/latest/docs/adding_metric.html#metric-outputs-with-multiple-metrics


Non backward compatible changes in catalog
* change rag metrics name convention (e.g. "metrics.rag.mrr" -> "metrics.rag.context_correctness.mrr",) - catalog non backward compatible change by assaftibm in https://github.com/IBM/unitxt/pull/1104
* Update summarization task and templates to support multiple reference summaries - by yoavkatz in https://github.com/IBM/unitxt/pull/1126
* Fix belebele due to new convention by elronbandel in https://github.com/IBM/unitxt/pull/1145


Additions to catalog
* Add DeepSeek-Coder format and system prompt by oktie in https://github.com/IBM/unitxt/pull/1105
* Add a metric to calculate the ratio of references included in the prediction by marukaz in https://github.com/IBM/unitxt/pull/1091
* adding RAG bge metrics by assaftibm

New Features
* Add option to run multiple templates and or num_demos in single dataset recipe. Now it is possible to give a list of templates or num_demos. Unitxt will randomly sample from the templates and for each instance assign a random template from the list. by elronbandel in https://github.com/IBM/unitxt/pull/1110
* A warning is now generated when a metric generate a score with the same name as that of another metric and overwrites it dafnapension in https://github.com/IBM/unitxt/pull/1124
* MetricPipeline fields postpreprocess_steps has been renamed to postprocess_steps. The old field (postpreprocess_steps) still exists for backward compatible but depricated. by dafnapension in https://github.com/IBM/unitxt/pull/1117
* Decrease runtime of demo examples
* Add tests for RAG metrics by matanor
* Adding dedicated Unitxt warning and error classes to link online documentation by yoavkatz in
* The code now uses a central controllable deepcopy function by elronbandel in https://github.com/IBM/unitxt/pull/1120



Bug Fixes
* Create a dedicated nltk a mixin, for downloading all versions of punkt which needed by metrics code. by elronbandel in https://github.com/IBM/unitxt/pull/1151
* For bulk instance metrics, Replace mean function with nanmean to support aggregation in case of nan scores. by elronbandel in https://github.com/IBM/unitxt/pull/1150
* Fix helm test by elronbandel in https://github.com/IBM/unitxt/pull/1109
* Fix bug with RAG metrics: Fix use of minilm model by assaftibm in https://github.com/IBM/unitxt/pull/1115
* Fix data classification of WML model to include 'public' classification by yoavkatz in https://github.com/IBM/unitxt/pull/1118
* Fix WMLInferenceEngine by pawelknes in https://github.com/IBM/unitxt/pull/1122
* Fix belebele HF path due to new convention by elronbandel in https://github.com/IBM/unitxt/pull/1145


Documentation changes
* Improve debugging.rst wording
* Improve examples.rst wording by welisheva22 in https://github.com/IBM/unitxt/pull/1138
* Improve data_classification_policy.rst wording by welisheva22 in https://github.com/IBM/unitxt/pull/1139
* Improve rag_support.rst wording by welisheva22 in https://github.com/IBM/unitxt/pull/1139
* Improve production.rst wording by welisheva22 in https://github.com/IBM/unitxt/pull/1148
* Improve the clarity of the code examples.
* Improve load_datasets.rst wording by welisheva22
* Improve introduction.rst wording by welisheva22
* Improve installation.rst wording by welisheva22
* Improve adding_format.rst wording by welisheva22
* Improve adding_task.rst wording by welisheva22
* Improve adding_template.rst wording by welisheva22
* mprove adding_dataset.rst wording by hanansinger
* improve index.rst page by yoavkatz
* Fix link to llama blog in adding_format.rst by andersonm-ibm in https://github.com/IBM/unitxt/pull/1113
* Added example of RAG response by yoavkatz in https://github.com/IBM/unitxt/pull/1121

New Contributors
* andersonm-ibm made their first contribution in https://github.com/IBM/unitxt/pull/1113 by welisheva22 in https://github.com/IBM/unitxt/pull/1152

1.12.2

Main changes

* Task "input"/"output" fields renamed to "input_fields" and "reference_fields" to be better reflect their meaning and the type of each field is now define by python class names and not strings (str vs "str") . See example of new syntax here:
https://www.unitxt.ai/en/latest/docs/adding_task.html (old syntax still allowed)
* Ability create ensemble of judges . See example in https://www.unitxt.ai/en/latest/docs/examples.html#evaluate-using-ensemble-of-llm-as-a-judge-metrics
* Optimized Rouge and Meteor metrics to run faster and now report confidence intervals by default. This cause very small variances in scores (well within the confidence internal)
* Added ability to select demonstrations that depend on the specific instance (and not only random). See example in https://github.com/IBM/unitxt/blob/main/examples/evaluate_different_demo_selections.py . This change causes some changes in selection of random demos due to seed changes, but should not have any aggregated effect beyond random fluctuations.
* For LLM as Judges, the input sent to the judge is now displayed in the score field called 'judge_raw_input'
* Support for arena hard benchmark. See example: https://github.com/IBM/unitxt/blob/main/examples/evaluate_a_model_using_arena_hard.py

Non backward compatible changes
* changed method template names "input_fields" and "reference_ fields" (effects only people who wrote custom templates code) by yoavkatz in https://github.com/IBM/unitxt/pull/1030
* Refactor Rouge and Meteor to InstanceMetric for faster score computation - this cause very small variances in scores (well within the confidence internal) by yoavkatz in https://github.com/IBM/unitxt/pull/1011
* Ability to create demo samplers based on instance (this causes changes in random selection of demos in normal mode) by yoavkatz in https://github.com/IBM/unitxt/pull/1034

Changes in Catalog
* safety and regard metrics became instance metrics and named SafetyMetric and RegardMetric by dafnapension in https://github.com/IBM/unitxt/pull/1004
* Remove financebench card since it was removed from HF by elronbandel in https://github.com/IBM/unitxt/pull/1016
* add validation to tldr, remove shuffle from billsum by alonh in https://github.com/IBM/unitxt/pull/1038
* Fix typo in japanese_llama system prompt (issue 964) by bnayahu in https://github.com/IBM/unitxt/pull/1056
* numeric nlg dataset template changes by ShirApp in https://github.com/IBM/unitxt/pull/1041

Additions to catalog

* Arena hard elad2 by eladven and OfirArviv in https://github.com/IBM/unitxt/pull/1026
* Add flores101 by perlitz in https://github.com/IBM/unitxt/pull/1053
* Add metric "metrics.rag.retrieval_at_k" to catalog by matanor in https://github.com/IBM/unitxt/pull/1074
* Add Finqa dataset by ShirApp in https://github.com/IBM/unitxt/pull/962
* Allow rag context_id fields to be List[str] and not only List[int] by perlitz in https://github.com/IBM/unitxt/pull/1036
* Rag end to end task support (in progress) - by benjaminsznajder in https://github.com/IBM/unitxt/pull/1044, https://github.com/IBM/unitxt/pull/1080

New Features
* Rename task fields "input"/"output" fields r to "input_fields" and "reference_fields" by luisaadanttas in https://github.com/IBM/unitxt/pull/994
* Support for ensemble by metrics eladven in https://github.com/IBM/unitxt/pull/1047
* Additional inference parameters for openai and genai and simplfied InferenceEngine API param passing by pawelknes in https://github.com/IBM/unitxt/pull/1019 pawelknes in https://github.com/IBM/unitxt/pull/1024
* Real types in tasks and metrics by elronbandel in https://github.com/IBM/unitxt/pull/1045
* Ability to create demo samplers based on instance by yoavkatz in https://github.com/IBM/unitxt/pull/1034
* add judge input to the LLM as Judge metric scores by OfirArviv in https://github.com/IBM/unitxt/pull/1064

Bug Fixes
* Solve problem with striping format at LLM as a judge code. by eladven in https://github.com/IBM/unitxt/pull/1005
* Added seed to LLM as judges for consistent results by yoavkatz in https://github.com/IBM/unitxt/pull/1029
* Fixed issues with fresh install by yoavkatz in https://github.com/IBM/unitxt/pull/1037
* WML Inference Engine fix by pawelknes in https://github.com/IBM/unitxt/pull/1013
* replace type and __type__ in type error message by perlitz in https://github.com/IBM/unitxt/pull/1035
* FinQA - filter problematic examples by ShirApp in https://github.com/IBM/unitxt/pull/1039
* demo's target prefix is now taken from demo instance by dafnapension in https://github.com/IBM/unitxt/pull/1031
* Make sure preparation times printed fully and nicely by elronbandel in https://github.com/IBM/unitxt/pull/1046
* Added prediction type to llm as jusdge to avoid warning by yoavkatz in https://github.com/IBM/unitxt/pull/1072
* Fixed confidence interval inconsistency when some metrics compute ci and some do not by dafnapension in https://github.com/IBM/unitxt/pull/1065
* Fix bug in data classes and add support for field overriding in fields containing types or functions by elronbandel in https://github.com/IBM/unitxt/pull/1027
* Set LoadFromIBMCloud verify to be lazy, in order to allow preparing the cards without define FMEVAL_COS_URL by eladven in https://github.com/IBM/unitxt/pull/1021
* Added check of type of format and system prompt to LLM as judge by yoavkatz in https://github.com/IBM/unitxt/pull/1068
* Allow assigning None in overwrites when fetching artifacts with modifications by dafnapension in https://github.com/IBM/unitxt/pull/1062
* fix - building test is not working. Updated Kaggle version. by benjaminsznajder in https://github.com/IBM/unitxt/pull/1055

Documentation changes
* Update error message and documentation on unitxt local and HF version conflict by yoavkatz in https://github.com/IBM/unitxt/pull/995
* Update llm_as_judge.rst by yoavkatz in https://github.com/IBM/unitxt/pull/1085
* Update introduction.rst add the word "a" before "variety" by welisheva22 in https://github.com/IBM/unitxt/pull/1015
* Example improvements by yoavkatz in https://github.com/IBM/unitxt/pull/1022
* Add a guide for using unitxt with lm-evaluation-harness by elronbandel in https://github.com/IBM/unitxt/pull/1020
* Fix some docs titles and links by elronbandel in https://github.com/IBM/unitxt/pull/1023
* Add example of meta evaluation of llm as judge by yoavkatz in https://github.com/IBM/unitxt/pull/1025
* Update introduction.rst - - copy edits (grammar, consistency, clarity) by welisheva22 in https://github.com/IBM/unitxt/pull/1063
* Added example for selection of demos by yoavkatz in https://github.com/IBM/unitxt/pull/1052

-----

New Contributors

We want to thank the new contributors for their first contributions!

* welisheva22 made their first contribution in https://github.com/IBM/unitxt/pull/1015
* luisaadanttas made their first contribution in https://github.com/IBM/unitxt/pull/994
* benjaminsznajder made their first contribution in https://github.com/IBM/unitxt/pull/1055
* hanansinger made their first contribution in https://github.com/IBM/unitxt/pull/1057

1.12.0

Main changes

* Task "input"/"output" fields renamed to "input_fields" and "reference_fields" to be better reflect their meaning and the type of each field is now define by python class names and not strings (str vs "str") . See example of new syntax here:
https://www.unitxt.ai/en/latest/docs/adding_task.html (old syntax still allowed)
* Ability create ensemble of judges . See example in https://www.unitxt.ai/en/latest/docs/examples.html#evaluate-using-ensemble-of-llm-as-a-judge-metrics
* Optimized Rouge and Meteor metrics to run faster and now report confidence intervals by default. This cause very small variances in scores (well within the confidence internal)
* Added ability to select demonstrations that depend on the specific instance (and not only random). See example in https://github.com/IBM/unitxt/blob/main/examples/evaluate_different_demo_selections.py . This change causes some changes in selection of random demos due to seed changes, but should not have any aggregated effect beyond random fluctuations.
* For LLM as Judges, the input sent to the judge is now displayed in the score field called 'judge_raw_input'
* Support for arena hard benchmark. See example: https://github.com/IBM/unitxt/blob/main/examples/evaluate_a_model_using_arena_hard.py

Non backward compatible changes
* changed method template names "input_fields" and "reference_ fields" (effects only people who wrote custom templates code) by yoavkatz in https://github.com/IBM/unitxt/pull/1030
* Refactor Rouge and Meteor to InstanceMetric for faster score computation - this cause very small variances in scores (well within the confidence internal) by yoavkatz in https://github.com/IBM/unitxt/pull/1011
* Ability to create demo samplers based on instance (this causes changes in random selection of demos in normal mode) by yoavkatz in https://github.com/IBM/unitxt/pull/1034

Changes in Catalog
* safety and regard metrics became instance metrics and named SafetyMetric and RegardMetric by dafnapension in https://github.com/IBM/unitxt/pull/1004
* Remove financebench card since it was removed from HF by elronbandel in https://github.com/IBM/unitxt/pull/1016
* add validation to tldr, remove shuffle from billsum by alonh in https://github.com/IBM/unitxt/pull/1038
* Fix typo in japanese_llama system prompt (issue 964) by bnayahu in https://github.com/IBM/unitxt/pull/1056
* numeric nlg dataset template changes by ShirApp in https://github.com/IBM/unitxt/pull/1041

Additions to catalog

* Arena hard elad2 by eladven and OfirArviv in https://github.com/IBM/unitxt/pull/1026
* Add flores101 by perlitz in https://github.com/IBM/unitxt/pull/1053
* Add metric "metrics.rag.retrieval_at_k" to catalog by matanor in https://github.com/IBM/unitxt/pull/1074
* Add Finqa dataset by ShirApp in https://github.com/IBM/unitxt/pull/962
* Allow rag context_id fields to be List[str] and not only List[int] by perlitz in https://github.com/IBM/unitxt/pull/1036
* Rag end to end task support (in progress) - by benjaminsznajder in https://github.com/IBM/unitxt/pull/1044, https://github.com/IBM/unitxt/pull/1080

New Features
* Rename task fields "input"/"output" fields r to "input_fields" and "reference_fields" by luisaadanttas in https://github.com/IBM/unitxt/pull/994
* Support for ensemble by metrics eladven in https://github.com/IBM/unitxt/pull/1047
* Additional inference parameters for openai and genai and simplfied InferenceEngine API param passing by pawelknes in https://github.com/IBM/unitxt/pull/1019 pawelknes in https://github.com/IBM/unitxt/pull/1024
* Real types in tasks and metrics by elronbandel in https://github.com/IBM/unitxt/pull/1045
* Ability to create demo samplers based on instance by yoavkatz in https://github.com/IBM/unitxt/pull/1034
* add judge input to the LLM as Judge metric scores by OfirArviv in https://github.com/IBM/unitxt/pull/1064

Bug Fixes
* Solve problem with striping format at LLM as a judge code. by eladven in https://github.com/IBM/unitxt/pull/1005
* Added seed to LLM as judges for consistent results by yoavkatz in https://github.com/IBM/unitxt/pull/1029
* Fixed issues with fresh install by yoavkatz in https://github.com/IBM/unitxt/pull/1037
* WML Inference Engine fix by pawelknes in https://github.com/IBM/unitxt/pull/1013
* replace type and __type__ in type error message by perlitz in https://github.com/IBM/unitxt/pull/1035
* FinQA - filter problematic examples by ShirApp in https://github.com/IBM/unitxt/pull/1039
* demo's target prefix is now taken from demo instance by dafnapension in https://github.com/IBM/unitxt/pull/1031
* Make sure preparation times printed fully and nicely by elronbandel in https://github.com/IBM/unitxt/pull/1046
* Added prediction type to llm as jusdge to avoid warning by yoavkatz in https://github.com/IBM/unitxt/pull/1072
* Fixed confidence interval inconsistency when some metrics compute ci and some do not by dafnapension in https://github.com/IBM/unitxt/pull/1065
* Fix bug in data classes and add support for field overriding in fields containing types or functions by elronbandel in https://github.com/IBM/unitxt/pull/1027
* Set LoadFromIBMCloud verify to be lazy, in order to allow preparing the cards without define FMEVAL_COS_URL by eladven in https://github.com/IBM/unitxt/pull/1021
* Added check of type of format and system prompt to LLM as judge by yoavkatz in https://github.com/IBM/unitxt/pull/1068
* Allow assigning None in overwrites when fetching artifacts with modifications by dafnapension in https://github.com/IBM/unitxt/pull/1062
* fix - building test is not working. Updated Kaggle version. by benjaminsznajder in https://github.com/IBM/unitxt/pull/1055

Documentation changes
* Update error message and documentation on unitxt local and HF version conflict by yoavkatz in https://github.com/IBM/unitxt/pull/995
* Update llm_as_judge.rst by yoavkatz in https://github.com/IBM/unitxt/pull/1085
* Update introduction.rst add the word "a" before "variety" by welisheva22 in https://github.com/IBM/unitxt/pull/1015
* Example improvements by yoavkatz in https://github.com/IBM/unitxt/pull/1022
* Add a guide for using unitxt with lm-evaluation-harness by elronbandel in https://github.com/IBM/unitxt/pull/1020
* Fix some docs titles and links by elronbandel in https://github.com/IBM/unitxt/pull/1023
* Add example of meta evaluation of llm as judge by yoavkatz in https://github.com/IBM/unitxt/pull/1025
* Update introduction.rst - - copy edits (grammar, consistency, clarity) by welisheva22 in https://github.com/IBM/unitxt/pull/1063
* Added example for selection of demos by yoavkatz in https://github.com/IBM/unitxt/pull/1052

-----

New Contributors

We want to thank the new contributors for their first contributions!

* welisheva22 made their first contribution in https://github.com/IBM/unitxt/pull/1015
* luisaadanttas made their first contribution in https://github.com/IBM/unitxt/pull/994
* benjaminsznajder made their first contribution in https://github.com/IBM/unitxt/pull/1055
* hanansinger made their first contribution in https://github.com/IBM/unitxt/pull/1057

1.11.1

Non backward compatible changes
* The class InputOutputTemplate has the field input_format. This field becomes a required field. It means that templates should explicitly set their value to None if not using it. by elronbandel in https://github.com/IBM/unitxt/pull/982
* fix MRR RAG metric - fix MRR wiring, allow the context_ids to be a list of strings, instead of a list[list[str]]. This allows directly passing the list of predicted context ids, as was done in unitxt version 1.7. added corresponding tests. This change may change the scores of MRR metric. by matanor in

New Features
* Add the option to specify the number of processes to use for parallel dataset loading by csrajmohan in https://github.com/IBM/unitxt/pull/974
* Add option for lazy load hf inference engine by elronbandel in https://github.com/IBM/unitxt/pull/980
* Added a format based on Huggingface format by yoavkatz in https://github.com/IBM/unitxt/pull/988

New Assets
* Add code mixing metric, add language identification task, add format for Starling model by arielge in https://github.com/IBM/unitxt/pull/956

Bug Fixes
* Fix llama_3_ibm_genai_generic_template by lga-zurich in https://github.com/IBM/unitxt/pull/978

Documentation
* Add an example that shows how to use LLM as a judge that takes the references into account… by eladven in https://github.com/IBM/unitxt/pull/981
* Improve the examples table documentation by eladven in https://github.com/IBM/unitxt/pull/976

Refactoring
* Delete empty metrics folder by elronbandel in https://github.com/IBM/unitxt/pull/984

Testing and CI/CD
* Add answer correctness tests by matanor in https://github.com/IBM/unitxt/pull/977

New Contributors
* lga-zurich made their first contribution in https://github.com/IBM/unitxt/pull/978

**Full Changelog**: https://github.com/IBM/unitxt/compare/1.10.1...1.10.2

1.11.0

Non backward compatible changes
* The class InputOutputTemplate has the field input_format. This field becomes a required field. It means that templates should explicitly set their value to None if not using it. by elronbandel in https://github.com/IBM/unitxt/pull/982
* fix MRR RAG metric - fix MRR wiring, allow the context_ids to be a list of strings, instead of a list[list[str]]. This allows directly passing the list of predicted context ids, as was done in unitxt version 1.7. added corresponding tests. This change may change the scores of MRR metric. by matanor in

New Features
* Add the option to specify the number of processes to use for parallel dataset loading by csrajmohan in https://github.com/IBM/unitxt/pull/974
* Add option for lazy load hf inference engine by elronbandel in https://github.com/IBM/unitxt/pull/980
* Added a format based on Huggingface format by yoavkatz in https://github.com/IBM/unitxt/pull/988

New Assets
* Add code mixing metric, add language identification task, add format for Starling model by arielge in https://github.com/IBM/unitxt/pull/956

Bug Fixes
* Fix llama_3_ibm_genai_generic_template by lga-zurich in https://github.com/IBM/unitxt/pull/978

Documentation
* Add an example that shows how to use LLM as a judge that takes the references into account… by eladven in https://github.com/IBM/unitxt/pull/981
* Improve the examples table documentation by eladven in https://github.com/IBM/unitxt/pull/976

Refactoring
* Delete empty metrics folder by elronbandel in https://github.com/IBM/unitxt/pull/984

Testing and CI/CD
* Add answer correctness tests by matanor in https://github.com/IBM/unitxt/pull/977

New Contributors
* lga-zurich made their first contribution in https://github.com/IBM/unitxt/pull/978

**Full Changelog**: https://github.com/IBM/unitxt/compare/1.10.1...1.10.2

1.10.3

Non backward compatible changes
* The class InputOutputTemplate has the field input_format. This field becomes a required field. It means that templates should explicitly set their value to None if not using it. by elronbandel in https://github.com/IBM/unitxt/pull/982
* fix MRR RAG metric - fix MRR wiring, allow the context_ids to be a list of strings, instead of a list[list[str]]. This allows directly passing the list of predicted context ids, as was done in unitxt version 1.7. added corresponding tests. This change may change the scores of MRR metric. by matanor in

New Features
* Add the option to specify the number of processes to use for parallel dataset loading by csrajmohan in https://github.com/IBM/unitxt/pull/974
* Add option for lazy load hf inference engine by elronbandel in https://github.com/IBM/unitxt/pull/980
* Added a format based on Huggingface format by yoavkatz in https://github.com/IBM/unitxt/pull/988

New Assets
* Add code mixing metric, add language identification task, add format for Starling model by arielge in https://github.com/IBM/unitxt/pull/956

Bug Fixes
* Fix llama_3_ibm_genai_generic_template by lga-zurich in https://github.com/IBM/unitxt/pull/978

Documentation
* Add an example that shows how to use LLM as a judge that takes the references into account… by eladven in https://github.com/IBM/unitxt/pull/981
* Improve the examples table documentation by eladven in https://github.com/IBM/unitxt/pull/976

Refactoring
* Delete empty metrics folder by elronbandel in https://github.com/IBM/unitxt/pull/984

Testing and CI/CD
* Add answer correctness tests by matanor in https://github.com/IBM/unitxt/pull/977

New Contributors
* lga-zurich made their first contribution in https://github.com/IBM/unitxt/pull/978

**Full Changelog**: https://github.com/IBM/unitxt/compare/1.10.1...1.10.2

Page 2 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.