Unitxt

Latest version: v1.20.0

Safety actively analyzes 714919 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 6 of 10

1.9.0

What's Changed
The most important things are:

- Addition of LLM as a Judge Metrics and Tasks for both evaluating LLMs as judge and using them for evaluation of other tasks. Read more in the [LLM as a Judge Tutorial](https://unitxt.readthedocs.io/en/latest/docs/llm_as_judge.html)
- Addition of RAG response generation tasks and datasets as part of an effort to add comprhensive RAG evaluation to unitxt.
- Renaming FormTask to Task for simplicity
- Major improvments to documentation and tutorials

Breaking Changes 🚨
* Ensure consistent evaluation of CI across implementations [Might change previous results] by dafnapension in https://github.com/IBM/unitxt/pull/844
* Fix default format so it will be the same as formats.empty in catalog. Impacts runs that did not specify a format by yoavkatz in https://github.com/IBM/unitxt/pull/848
* LoadJson operator moved from unit.processors to unitxt.struct_data_operators
* Fixed YesNoTemplate and Diverse LabelSampler, to support binary task typing. YesNoTemplate now expect class field to contain a string and not a list of of strings with one elements by yoavkatz in https://github.com/IBM/unitxt/pull/836

Bug Fixes
* Change processor type for to_list_by_comma_from_references by antonpibm in https://github.com/IBM/unitxt/pull/815
* Handle empty text in Literal Eval by antonpibm in https://github.com/IBM/unitxt/pull/819
* Fix clash between dir names and artifact names in catalog website by elronbandel in https://github.com/IBM/unitxt/pull/825
* Ner typing had a mistake. by yoavkatz in https://github.com/IBM/unitxt/pull/832
* Fix catalog reference by elronbandel in https://github.com/IBM/unitxt/pull/838
* Fix default format by yoavkatz in https://github.com/IBM/unitxt/pull/848
* Fixed YesNoTemplate and Diverse LabelSampler, to support binary task typing. by yoavkatz in https://github.com/IBM/unitxt/pull/836

New Features
* Support prediction regex match by setting the operator as a postproce… by antonpibm in https://github.com/IBM/unitxt/pull/792
* Add sample score output in test card by yoavkatz in https://github.com/IBM/unitxt/pull/803
* Support for loading dictionaries by pawelknes in https://github.com/IBM/unitxt/pull/784
* Add ability to fuse, split, MultiStreamScoreMean, and merge all by dafnapension in https://github.com/IBM/unitxt/pull/767
* Changed default log verbosity to "info" instead of "debug" by yoavkatz in https://github.com/IBM/unitxt/pull/822
* Skip artifact prepare and verify in catalog consistency tests by elronbandel in https://github.com/IBM/unitxt/pull/839
* Add seperation between eagered streams and regular streams by elronbandel in https://github.com/IBM/unitxt/pull/846
* Add precision and recall scores to f1_binary, max_f1_binary by lilacheden in https://github.com/IBM/unitxt/pull/824
* Rename task by elronbandel in https://github.com/IBM/unitxt/pull/850

New Assets
* Add basic format for llama3 models by arielge in https://github.com/IBM/unitxt/pull/812
* Adding literal eval processor by antonpibm in https://github.com/IBM/unitxt/pull/813
* Add RAG (response generation part) tasks and datasets by perlitz in https://github.com/IBM/unitxt/pull/811
* Add 5 legalbench tasks (the 5 existing in HELM) by perlitz in https://github.com/IBM/unitxt/pull/827
* Add financebench by perlitz in https://github.com/IBM/unitxt/pull/828
* Add billsum dataset by perlitz in https://github.com/IBM/unitxt/pull/830
* Add tldr dataset by perlitz in https://github.com/IBM/unitxt/pull/831
* Add Attaq500 by naamaz in https://github.com/IBM/unitxt/pull/835
* Add llm as judge mt-bench dataset and metrics by OfirArviv in https://github.com/IBM/unitxt/pull/791

Documentation
* Documentation review by yoavkatz in https://github.com/IBM/unitxt/pull/805
* Added documentation for global and huggingface metrics by yoavkatz in https://github.com/IBM/unitxt/pull/807
* Touch up docs by elronbandel in https://github.com/IBM/unitxt/pull/809
* Remove the contents from main menu by elronbandel in https://github.com/IBM/unitxt/pull/810
* Add tags docs by elronbandel in https://github.com/IBM/unitxt/pull/814
* Reviewing Unitxt tutorials by michal-jacovi in https://github.com/IBM/unitxt/pull/817
* Fix the link to the operators tutorial by elronbandel in https://github.com/IBM/unitxt/pull/821
* More documentation changes in metrics by yoavkatz in https://github.com/IBM/unitxt/pull/820
* Update adding_task.rst by michal-jacovi in https://github.com/IBM/unitxt/pull/823
* Fix missing mandatory new line in the begging of code block in documentation by elronbandel in https://github.com/IBM/unitxt/pull/829
* Add description, homepage, and citation obtained from HF with datasets.load_dataset_builder by dafnapension in https://github.com/IBM/unitxt/pull/818
* Updated documentation by yoavkatz in https://github.com/IBM/unitxt/pull/849

New Contributors
* antonpibm made their first contribution in https://github.com/IBM/unitxt/pull/792
* michal-jacovi made their first contribution in https://github.com/IBM/unitxt/pull/817

**Full Changelog**: https://github.com/IBM/unitxt/compare/1.8.1...1.9.0

1.8.1

What's Changed
* Fix missing experiment_id for multiprocessing evaluation by alonh in https://github.com/IBM/unitxt/pull/798
* Add cache to metric prediction_type to speedup by yoavkatz in https://github.com/IBM/unitxt/pull/801

**Full Changelog**: https://github.com/IBM/unitxt/compare/1.8.0...1.8.1

1.8.0

What's Changed

In this release, the main improvement focuses on introducing type checking within Unitxt tasks. Tasks are fundamental to the Unitxt protocol, acting as standardized blueprints for those integrating new datasets into Unitxt. They facilitate the use of task-specific templates and metrics. To guarantee precise dataset processing in line with the task schema, we've introduced explicit types to the task fields.

For example, consider the NER task in Unitxt, previously defined as follows:
python
add_to_catalog(
FormTask(
inputs=["text", "entity_types"],
outputs=["spans_starts", "spans_ends", "text", "labels"],
metrics=["metrics.ner"],
),
"tasks.ner",
)

Now, the NER task definition includes explicit types:
python
add_to_catalog(
FormTask(
inputs={"text": "str", "entity_types": "List[str]"},
outputs={
"spans_starts": "List[int]",
"spans_ends": "List[int]",
"text": "List[str]",
"labels": "List[str]",
},
prediction_type="List[Tuple[str,str]]",
metrics=["metrics.ner"],
),
"tasks.ner",
)


This enhancement aligns with Unitxt's goal that definitions should be easily understandable and capable of facilitating validation processes with appropriate error messages to guide developers in identifying and solving issues.

Right now , using the original definition format without typing , will continue to work but generate a warning message. You should begin to adapt your tasks definition by adding types.


'inputs' field of Task should be a dictionary of field names and their types. For example, {'text': 'str', 'classes': 'List[str]'}. Instead only '['question', 'question_id', 'topic']' was passed. All types will be assumed to be 'Any'. In future version of unitxt this will raise an exception.
'outputs' field of Task should be a dictionary of field names and their types. For example, {'text': 'str', 'classes': 'List[str]'}. Instead only '['reference_answers', 'reference_contexts', 'reference_context_ids', 'is_answerable_label']' was passed. All types will be assumed to be 'Any'. In future version of unitxt this will raise an exception.


Special thanks to pawelknes who implemented this important feature. It truly demonstrates the collective power of the Unitxt community and the invaluable contributions made by Unitxt users beyond the core development team. Such contributions are highly appreciated and encouraged.

* For more detailed information, please refer to https://github.com/IBM/unitxt/pull/710

Breaking Changes

"metrics.spearman", "metrics.kendalltau_b", "metrics.roc_auc": prediction type is float.
"metrics.f1_binary","metrics.accuracy_binary", "metrics.precision_binary", "metrics.recall_binary", "metrics.max_f1_binary", "metrics.max_accuracy_binary": prediction type is Union[float, int], references must be equal to 0 or 1

Bug Fixes
* Set empty list if preprocess_steps is None by marukaz in https://github.com/IBM/unitxt/pull/780
* Fix UI load failure due to typo by yoavkatz in https://github.com/IBM/unitxt/pull/785
* Fix huggingface uploads by elronbandel in https://github.com/IBM/unitxt/pull/793
* Fix typo in error message by marukaz in https://github.com/IBM/unitxt/pull/777

New Assets
* add perplexity with Mistral model by lilacheden in https://github.com/IBM/unitxt/pull/713

New Features
* Type checking for task definition by pawelknes in https://github.com/IBM/unitxt/pull/710
* Add open and ibm_genai to llm as judge inference engine by OfirArviv in https://github.com/IBM/unitxt/pull/782
* Add negative class score for binary precision, recall, f1 and max f1 by lilacheden in https://github.com/IBM/unitxt/pull/788
1. Add negative class score for binary precision, recall, f1 and max f1, e.g. f1_binary now returns also "f1_binary_neg".
2. Support Unions in metric prediction_type
3. Add processor cast_to_float_return_nan_if_failed
4. Breaking change: Make prediction_type of metrics numeric:
A. "metrics.kendalltau_b", "metrics.roc_auc": prediction type is float.
B. "metrics.f1_binary","metrics.accuracy_binary", "metrics.precision_binary", "metrics.recall_binary", "metrics.max_f1_binary", "metrics.max_accuracy_binary": prediction type is Union[float, int], references must be equal to 0 or 1
* Group shuffle by sam-data-guy-iam in https://github.com/IBM/unitxt/pull/639

Documentation
* Fix a small typo by dafnapension in https://github.com/IBM/unitxt/pull/779
* Update instructions to install HELM from PyPI by yifanmai in https://github.com/IBM/unitxt/pull/783
* Update few-shot instructions in Unitxt with HELM by yifanmai in https://github.com/IBM/unitxt/pull/774


**Full Changelog**: https://github.com/IBM/unitxt/compare/1.7.7...1.8.0

**Full Changelog**: https://github.com/IBM/unitxt/compare/1.8.1...1.8.0

1.7.9

What's Changed
* Set empty list if preprocess_steps is None by marukaz in https://github.com/IBM/unitxt/pull/780
* fix a small typo by dafnapension in https://github.com/IBM/unitxt/pull/779
* Fix typo by marukaz in https://github.com/IBM/unitxt/pull/777
* Group shuffle by sam-data-guy-iam in https://github.com/IBM/unitxt/pull/639
* add perplexity with Mistral model by lilacheden in https://github.com/IBM/unitxt/pull/713
* Fix UI load failure due to typo by yoavkatz in https://github.com/IBM/unitxt/pull/785
* Type checking for task definition by pawelknes in https://github.com/IBM/unitxt/pull/710
* Add open and ibm_genai to llm as judge inference engine by OfirArviv in https://github.com/IBM/unitxt/pull/782
* Avoid creating a demo pool if num_demos is 0. by yoavkatz in https://github.com/IBM/unitxt/pull/787
* Update test_helm.yml by elronbandel in https://github.com/IBM/unitxt/pull/789
* Update instructions to install HELM from PyPI by yifanmai in https://github.com/IBM/unitxt/pull/783
* Update few-shot instructions in Unitxt with HELM by yifanmai in https://github.com/IBM/unitxt/pull/774
* Update version to 1.7.8 by elronbandel in https://github.com/IBM/unitxt/pull/790
* Fix huggingface uploads by elronbandel in https://github.com/IBM/unitxt/pull/793
* Update version to 1.7.9 by elronbandel in https://github.com/IBM/unitxt/pull/794


**Full Changelog**: https://github.com/IBM/unitxt/compare/1.7.7...1.7.9

1.7.8

What's Changed
* Set empty list if preprocess_steps is None by marukaz in https://github.com/IBM/unitxt/pull/780
* fix a small typo by dafnapension in https://github.com/IBM/unitxt/pull/779
* Fix typo by marukaz in https://github.com/IBM/unitxt/pull/777
* Group shuffle by sam-data-guy-iam in https://github.com/IBM/unitxt/pull/639
* add perplexity with Mistral model by lilacheden in https://github.com/IBM/unitxt/pull/713
* Fix UI load failure due to typo by yoavkatz in https://github.com/IBM/unitxt/pull/785
* Type checking for task definition by pawelknes in https://github.com/IBM/unitxt/pull/710
* Add open and ibm_genai to llm as judge inference engine by OfirArviv in https://github.com/IBM/unitxt/pull/782
* Avoid creating a demo pool if num_demos is 0. by yoavkatz in https://github.com/IBM/unitxt/pull/787
* Update test_helm.yml by elronbandel in https://github.com/IBM/unitxt/pull/789
* Update instructions to install HELM from PyPI by yifanmai in https://github.com/IBM/unitxt/pull/783
* Update few-shot instructions in Unitxt with HELM by yifanmai in https://github.com/IBM/unitxt/pull/774
* Update version to 1.7.8 by elronbandel in https://github.com/IBM/unitxt/pull/790


**Full Changelog**: https://github.com/IBM/unitxt/compare/1.7.7...1.7.8

1.7.7

What's Changed
* adding multi-lingual bert score model by assaftibm in https://github.com/IBM/unitxt/pull/755
* Add HELM Integration: Guide, Examples and Tests by elronbandel in https://github.com/IBM/unitxt/pull/743
* Add production-time recipe processing capability to unitxt by elronbandel in https://github.com/IBM/unitxt/pull/739
* Add tags and descriptions for assets on the website by elronbandel in https://github.com/IBM/unitxt/pull/760
* Changed HELM integration docs to point to point to output result file by yoavkatz in https://github.com/IBM/unitxt/pull/761
* Allow FilterByCondition to condition also on subfields by dafnapension in https://github.com/IBM/unitxt/pull/762
* fix a small bug in BinaryMaxAccuracy by dafnapension in https://github.com/IBM/unitxt/pull/757
* Fix Reward metric warnings by assaftibm in https://github.com/IBM/unitxt/pull/765
* Added post processor to take first line in quantization templates by yoavkatz in https://github.com/IBM/unitxt/pull/770
* Support for parsing all strings representing valid Python type hints by pawelknes in https://github.com/IBM/unitxt/pull/754
* simplify bitwiseor-to-union and show a scheme for Literal by dafnapension in https://github.com/IBM/unitxt/pull/772
* Adding NLI model via perplexity by assaftibm in https://github.com/IBM/unitxt/pull/766
* Implement LLM as judge metrics by eladven in https://github.com/IBM/unitxt/pull/771
* Return loading step to enforce loader limit. by yoavkatz in https://github.com/IBM/unitxt/pull/775
* Update formats by elronbandel in https://github.com/IBM/unitxt/pull/769


**Full Changelog**: https://github.com/IBM/unitxt/compare/1.7.6...1.7.7

Page 6 of 10

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.