Unitxt

Latest version: v1.15.6

Safety actively analyzes 681844 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 8

1.10.2

Non backward compatible changes
* None - this release if fully compatible with the previous release.

New Features
* added num_proc parameter - Optional integer to specify the number of processes to use for parallel dataset loading by csrajmohan in https://github.com/IBM/unitxt/pull/974
* Add option to lazy load hf inference engine and fix requirements mechanism by elronbandel in https://github.com/IBM/unitxt/pull/980
* Add code mixing metric, add language identification task, add format for Starling model by arielge in https://github.com/IBM/unitxt/pull/956
* Add metrics: domesticated safety and regard by dafnapension in https://github.com/IBM/unitxt/pull/983
* Make input_format required field in InputOutputTemplate by elronbandel in https://github.com/IBM/unitxt/pull/982
* Added a format based on Huggingface format by yoavkatz in https://github.com/IBM/unitxt/pull/988

Bug Fixes
* Fix the error at the examples table by eladven in https://github.com/IBM/unitxt/pull/976
* fix MRR RAG metric - fix MRR wiring, allow the context_ids to be a list of strings, instead of a list[list[str]]. This allows directly passing the list of predicted context ids, as was done in unitxt version 1.7. added corresponding tests. by matanor in https://github.com/IBM/unitxt/pull/969
* Fix llama_3_ibm_genai_generic_template by lga-zurich in https://github.com/IBM/unitxt/pull/978

Documentation
* Add an example that shows how to use LLM as a judge that takes the references into account… by eladven in https://github.com/IBM/unitxt/pull/981

Refactoring
* Delete empty metrics folder by elronbandel in https://github.com/IBM/unitxt/pull/984

Testing and CI/CD
* Add answer correctness tests by matanor in https://github.com/IBM/unitxt/pull/977

New Contributors
* lga-zurich made their first contribution in https://github.com/IBM/unitxt/pull/978

**Full Changelog**: https://github.com/IBM/unitxt/compare/1.10.1...1.10.2

1.10.1

Main Changes

* Continued with major improvements to the documentation including [a new code examples section ](https://unitxt.readthedocs.io/en/latest/docs/examples.html) with standalone python code that shows how to perform evaluation, add new datasets, compare formats, use LLM as judges , and more. Cards for datasets from huggingface have detailed [descriptions](https://unitxt.readthedocs.io/en/latest/catalog/catalog.cards.sst2.html). New documentation of [RAG tasks and metrics](https://unitxt.readthedocs.io/en/latest/docs/rag_support.html).
* `load_dataset` can now load cards defined in a python file (and not only in the catalog). See [example](https://github.com/IBM/unitxt/blob/57957fc0e2303cb9a4389a15a8972dfd0ed8bbce/examples/standalone_qa_evaluation.py#L47).
* The evaluation results returned from `evaluate` now include two fields `predictions` and `processed_predictions`. See [example](https://github.com/IBM/unitxt/blob/57957fc0e2303cb9a4389a15a8972dfd0ed8bbce/examples/standalone_qa_evaluation.py#L75).
* The fields can have defaults, so if they are not specified in the card, they get a default value. For example, multi-class classification has `text` as the default `text_type`. See [example](https://unitxt.readthedocs.io/en/latest/catalog/catalog.tasks.classification.multi_class.html).


Non backward compatible changes

**You need to recreate the any cards/metrics you added by running prepare/*/*.py file. You can create all cards simply by running python utils/prepare_all_artifacts.py . This will avoid the __type__ error.**

**The AddFields operator was renamed Set and CopyFields operator was renamed Copy. Note previous code should continue to work, but we renamed all existing code in the unitxt and fm-eval repos.**


* Change Artifact.type to Artifact.__type__ by elronbandel in https://github.com/IBM/unitxt/pull/933
* change CopyFields operators name to Copy by duckling69 in https://github.com/IBM/unitxt/pull/876
* Rename AddFields to Set, a name that represent its role better and concisely by elronbandel in https://github.com/IBM/unitxt/pull/903

New Features
* Allow eager execution by elronbandel in https://github.com/IBM/unitxt/pull/888
* Add view option for Task definitions in UI explorer. by yoavkatz in https://github.com/IBM/unitxt/pull/891
* Add input type checking in LoadFromDictionary by yoavkatz in https://github.com/IBM/unitxt/pull/900
* Add TokensSlice operator by elronbandel in https://github.com/IBM/unitxt/pull/902
* Make some logs critical by elronbandel in https://github.com/IBM/unitxt/pull/973
* Add LogProbInferenceEngines API and implement for OpenAI by lilacheden in https://github.com/IBM/unitxt/pull/909
* Added support for ibm-watsonx-ai inference by pawelknes in https://github.com/IBM/unitxt/pull/961
* load_dataset supports loading cards not present in local catalog by pawelknes in https://github.com/IBM/unitxt/pull/929
* Added defaults to tasks by pawelknes in https://github.com/IBM/unitxt/pull/921
* Add raw predictions and references to results by yoavkatz in https://github.com/IBM/unitxt/pull/934
* Allow add-hoc metrics and template (and Add first version of standalone example of dataset with LLM as a judge ) by eladven in https://github.com/IBM/unitxt/pull/922
* Add infer() function for end to end inference pipeline by elronbandel in https://github.com/IBM/unitxt/pull/952



Bug Fixes
* LLMaaJ implementation of MLCommons' simple-safety-tests by bnayahu in https://github.com/IBM/unitxt/pull/873
* Update gradio version on website by elronbandel in https://github.com/IBM/unitxt/pull/896
* Improve demo by elronbandel in https://github.com/IBM/unitxt/pull/898
* Fix demo and organize files by elronbandel in https://github.com/IBM/unitxt/pull/897
* Make sacrebleu robust by yoavkatz in https://github.com/IBM/unitxt/pull/892
* Fix huggingface assets to have versions and up to date readme by elronbandel in https://github.com/IBM/unitxt/pull/895
* fix(cos loader): account for slashes in cos file name by jezekra1 in https://github.com/IBM/unitxt/pull/904
* llama3 instruct and chat system prompts by oktie in https://github.com/IBM/unitxt/pull/950
* Added trust_remote_code to HF dataset query operations by yoavkatz in https://github.com/IBM/unitxt/pull/911

Documentation
* Update llm_as_judge.rst by yoavkatz in https://github.com/IBM/unitxt/pull/970
* Michal Jacovi's completed manual review of the card descriptions by dafnapension in https://github.com/IBM/unitxt/pull/883
* In card preparers, generate the tags with "singletons" rather than values paired with True by dafnapension in https://github.com/IBM/unitxt/pull/874
* Improved documentation by yoavkatz in https://github.com/IBM/unitxt/pull/886
* Update glossary.rst by yoavkatz in https://github.com/IBM/unitxt/pull/899
* Add example section to documentation by yoavkatz in https://github.com/IBM/unitxt/pull/917
* Added example of open qa using catalog by yoavkatz in https://github.com/IBM/unitxt/pull/919
* Update example intro and simplified WNLI cards by yoavkatz in https://github.com/IBM/unitxt/pull/923
* Update adding_metric.rst by yoavkatz in https://github.com/IBM/unitxt/pull/955
* RAG documentation by yoavkatz in https://github.com/IBM/unitxt/pull/928
* docs: update adding_dataset.rst by eltociear in https://github.com/IBM/unitxt/pull/927
* prepare for __description__= that is different from those embedded automtically by dafnapension in https://github.com/IBM/unitxt/pull/937
* Add simple LLM as a judge example, of using it without installaiotn by eladven in https://github.com/IBM/unitxt/pull/968
* Add example of using LLM as a judge for summarization dataset. by eladven in https://github.com/IBM/unitxt/pull/965
* Improve operators documentation by elronbandel in https://github.com/IBM/unitxt/pull/942

New Assets
* Add numeric nlg dataset by ShirApp in https://github.com/IBM/unitxt/pull/882
* Add to_list_by_hyphen_space processor by marukaz in https://github.com/IBM/unitxt/pull/872
* Added tags and descriptions to safety cards by bnayahu in https://github.com/IBM/unitxt/pull/887
* Add Mt-Bench datasets + add operators by OfirArviv in https://github.com/IBM/unitxt/pull/870
* Touch up numeric nlg by elronbandel in https://github.com/IBM/unitxt/pull/889
* split train to train and validation sets in billsum by alonh in https://github.com/IBM/unitxt/pull/901
* modified wikitq, tab_fact taskcards by ShirApp in https://github.com/IBM/unitxt/pull/963
Implementation of TruthfulQA by bnayahu in https://github.com/IBM/unitxt/pull/931
* Add bluebench cards by perlitz in https://github.com/IBM/unitxt/pull/918
* Add LlamaIndex faithfulness metric by arielge in https://github.com/IBM/unitxt/pull/971
* Expanded template support for safety cards by bnayahu in https://github.com/IBM/unitxt/pull/943

Testing and CI/CD
* Add end to end realistic test to fusion by elronbandel in https://github.com/IBM/unitxt/pull/940
* Moved test_examples to run the actual examples by yoavkatz in https://github.com/IBM/unitxt/pull/913
* Use uv for installing requirements in actions by elronbandel in https://github.com/IBM/unitxt/pull/960
* Add ability to print_dict to print selected fields by yoavkatz in https://github.com/IBM/unitxt/pull/947
* Get rid of pkg_resources dependency by elronbandel in https://github.com/IBM/unitxt/pull/932
* adapt filtering lambda to datasets 2.20 by dafnapension in https://github.com/IBM/unitxt/pull/930
* Increase preparation log to error. by elronbandel in https://github.com/IBM/unitxt/pull/959


New Contributors
* ShirApp made their first contribution in https://github.com/IBM/unitxt/pull/882
* oktie made their first contribution in https://github.com/IBM/unitxt/pull/950

**Full Changelog**: https://github.com/IBM/unitxt/compare/1.10.0...1.10.1

1.10.0

Main changes

- Added support for handling sensitive data . When data is loaded from a data source using a Loader the user can specify the classification of the data (e.g. "public" or "proprietary"). Then Unitxt components such as metrics and inference engines checks if they are allowed to process the data based on their configuration. For example, an LLM as judge that sends data to remote services can be configured to only send "public" data to the remote services. This replaced the UNITXT_ALLOW_PASSING_DATA_TO_REMOTE_API option, which was a general flag that was not data dependent and hence error prone.
See more details in https://unitxt.readthedocs.io/en/latest/docs/data_classification_policy.html
- Added support for adding metric prefix. Each metric has a new optional string attribute "score_prefix", that is appended to all scores it generates. This allows the same metric to be used on different fields of the tasks, and distinguish the output score.
- New [Operators](https://unitxt.readthedocs.io/en/latest/docs/adding_operator.html) tutorial and [Loaders](https://unitxt.readthedocs.io/en/latest/unitxt.loaders.html#module-unitxt.loaders) documentation

Backward
- StreamInstanceOperator was renamed to InstanceOperator


New Features
* Support for handling sensitive data sent to remote services by pawelknes in https://github.com/IBM/unitxt/pull/806 , yoavkatz in https://github.com/IBM/unitxt/pull/868
* Added new NER metric using fuzzywuzzy logic by sarathsgvr in https://github.com/IBM/unitxt/pull/808
* Added loader from HF spaces by pawelknes in https://github.com/IBM/unitxt/pull/860
* Add metric prefix in main by yoavkatz in https://github.com/IBM/unitxt/pull/878
* add MinimumOneExamplePerLabelRefiner to allow ensuring at least one example of each labels appears in the training data. by alonh in https://github.com/IBM/unitxt/pull/867

Bug Fix
* Explorer UI crashed when no templates were defined in card by yoavkatz in https://github.com/IBM/unitxt/pull/855
* Fix operator and metrics data by yoavkatz in https://github.com/IBM/unitxt/pull/878
* Improved testing of cards by yoavkatz in https://github.com/IBM/unitxt/pull/861
* FormTask deprecation by yoavkatz in https://github.com/IBM/unitxt/pull/856

New Assets
* Adding go emotions dataset by shaigrt in https://github.com/IBM/unitxt/pull/865
* Implementation of select safety benchmarks by bnayahu in https://github.com/IBM/unitxt/pull/854

Documentation
* Update CONTRIBUTING.md by elronbandel in https://github.com/IBM/unitxt/pull/859
* Adding operator tutorial and standarizing operators names by elronbandel in https://github.com/IBM/unitxt/pull/863
* Fix code blocks in loaders docs by elronbandel in https://github.com/IBM/unitxt/pull/866
* Typo fix in unitext operators docs by duckling69 in https://github.com/IBM/unitxt/pull/877
* Add documntation to loaders by elronbandel in https://github.com/IBM/unitxt/pull/864
* Changes to introduction page by yoavkatz in https://github.com/IBM/unitxt/pull/852

New Contributors
* sarathsgvr made their first contribution in https://github.com/IBM/unitxt/pull/808
* bnayahu made their first contribution in https://github.com/IBM/unitxt/pull/854
* shaigrt made their first contribution in https://github.com/IBM/unitxt/pull/865
* duckling69 made their first contribution in https://github.com/IBM/unitxt/pull/877

**Full Changelog**: https://github.com/IBM/unitxt/compare/1.9.0...1.10.0

1.9.0

What's Changed
The most important things are:

- Addition of LLM as a Judge Metrics and Tasks for both evaluating LLMs as judge and using them for evaluation of other tasks. Read more in the [LLM as a Judge Tutorial](https://unitxt.readthedocs.io/en/latest/docs/llm_as_judge.html)
- Addition of RAG response generation tasks and datasets as part of an effort to add comprhensive RAG evaluation to unitxt.
- Renaming FormTask to Task for simplicity
- Major improvments to documentation and tutorials

Breaking Changes 🚨
* Ensure consistent evaluation of CI across implementations [Might change previous results] by dafnapension in https://github.com/IBM/unitxt/pull/844
* Fix default format so it will be the same as formats.empty in catalog. Impacts runs that did not specify a format by yoavkatz in https://github.com/IBM/unitxt/pull/848
* LoadJson operator moved from unit.processors to unitxt.struct_data_operators
* Fixed YesNoTemplate and Diverse LabelSampler, to support binary task typing. YesNoTemplate now expect class field to contain a string and not a list of of strings with one elements by yoavkatz in https://github.com/IBM/unitxt/pull/836

Bug Fixes
* Change processor type for to_list_by_comma_from_references by antonpibm in https://github.com/IBM/unitxt/pull/815
* Handle empty text in Literal Eval by antonpibm in https://github.com/IBM/unitxt/pull/819
* Fix clash between dir names and artifact names in catalog website by elronbandel in https://github.com/IBM/unitxt/pull/825
* Ner typing had a mistake. by yoavkatz in https://github.com/IBM/unitxt/pull/832
* Fix catalog reference by elronbandel in https://github.com/IBM/unitxt/pull/838
* Fix default format by yoavkatz in https://github.com/IBM/unitxt/pull/848
* Fixed YesNoTemplate and Diverse LabelSampler, to support binary task typing. by yoavkatz in https://github.com/IBM/unitxt/pull/836

New Features
* Support prediction regex match by setting the operator as a postproce… by antonpibm in https://github.com/IBM/unitxt/pull/792
* Add sample score output in test card by yoavkatz in https://github.com/IBM/unitxt/pull/803
* Support for loading dictionaries by pawelknes in https://github.com/IBM/unitxt/pull/784
* Add ability to fuse, split, MultiStreamScoreMean, and merge all by dafnapension in https://github.com/IBM/unitxt/pull/767
* Changed default log verbosity to "info" instead of "debug" by yoavkatz in https://github.com/IBM/unitxt/pull/822
* Skip artifact prepare and verify in catalog consistency tests by elronbandel in https://github.com/IBM/unitxt/pull/839
* Add seperation between eagered streams and regular streams by elronbandel in https://github.com/IBM/unitxt/pull/846
* Add precision and recall scores to f1_binary, max_f1_binary by lilacheden in https://github.com/IBM/unitxt/pull/824
* Rename task by elronbandel in https://github.com/IBM/unitxt/pull/850

New Assets
* Add basic format for llama3 models by arielge in https://github.com/IBM/unitxt/pull/812
* Adding literal eval processor by antonpibm in https://github.com/IBM/unitxt/pull/813
* Add RAG (response generation part) tasks and datasets by perlitz in https://github.com/IBM/unitxt/pull/811
* Add 5 legalbench tasks (the 5 existing in HELM) by perlitz in https://github.com/IBM/unitxt/pull/827
* Add financebench by perlitz in https://github.com/IBM/unitxt/pull/828
* Add billsum dataset by perlitz in https://github.com/IBM/unitxt/pull/830
* Add tldr dataset by perlitz in https://github.com/IBM/unitxt/pull/831
* Add Attaq500 by naamaz in https://github.com/IBM/unitxt/pull/835
* Add llm as judge mt-bench dataset and metrics by OfirArviv in https://github.com/IBM/unitxt/pull/791

Documentation
* Documentation review by yoavkatz in https://github.com/IBM/unitxt/pull/805
* Added documentation for global and huggingface metrics by yoavkatz in https://github.com/IBM/unitxt/pull/807
* Touch up docs by elronbandel in https://github.com/IBM/unitxt/pull/809
* Remove the contents from main menu by elronbandel in https://github.com/IBM/unitxt/pull/810
* Add tags docs by elronbandel in https://github.com/IBM/unitxt/pull/814
* Reviewing Unitxt tutorials by michal-jacovi in https://github.com/IBM/unitxt/pull/817
* Fix the link to the operators tutorial by elronbandel in https://github.com/IBM/unitxt/pull/821
* More documentation changes in metrics by yoavkatz in https://github.com/IBM/unitxt/pull/820
* Update adding_task.rst by michal-jacovi in https://github.com/IBM/unitxt/pull/823
* Fix missing mandatory new line in the begging of code block in documentation by elronbandel in https://github.com/IBM/unitxt/pull/829
* Add description, homepage, and citation obtained from HF with datasets.load_dataset_builder by dafnapension in https://github.com/IBM/unitxt/pull/818
* Updated documentation by yoavkatz in https://github.com/IBM/unitxt/pull/849

New Contributors
* antonpibm made their first contribution in https://github.com/IBM/unitxt/pull/792
* michal-jacovi made their first contribution in https://github.com/IBM/unitxt/pull/817

**Full Changelog**: https://github.com/IBM/unitxt/compare/1.8.1...1.9.0

1.8.1

What's Changed
* Fix missing experiment_id for multiprocessing evaluation by alonh in https://github.com/IBM/unitxt/pull/798
* Add cache to metric prediction_type to speedup by yoavkatz in https://github.com/IBM/unitxt/pull/801

**Full Changelog**: https://github.com/IBM/unitxt/compare/1.8.0...1.8.1

1.8.0

What's Changed

In this release, the main improvement focuses on introducing type checking within Unitxt tasks. Tasks are fundamental to the Unitxt protocol, acting as standardized blueprints for those integrating new datasets into Unitxt. They facilitate the use of task-specific templates and metrics. To guarantee precise dataset processing in line with the task schema, we've introduced explicit types to the task fields.

For example, consider the NER task in Unitxt, previously defined as follows:
python
add_to_catalog(
FormTask(
inputs=["text", "entity_types"],
outputs=["spans_starts", "spans_ends", "text", "labels"],
metrics=["metrics.ner"],
),
"tasks.ner",
)

Now, the NER task definition includes explicit types:
python
add_to_catalog(
FormTask(
inputs={"text": "str", "entity_types": "List[str]"},
outputs={
"spans_starts": "List[int]",
"spans_ends": "List[int]",
"text": "List[str]",
"labels": "List[str]",
},
prediction_type="List[Tuple[str,str]]",
metrics=["metrics.ner"],
),
"tasks.ner",
)


This enhancement aligns with Unitxt's goal that definitions should be easily understandable and capable of facilitating validation processes with appropriate error messages to guide developers in identifying and solving issues.

Right now , using the original definition format without typing , will continue to work but generate a warning message. You should begin to adapt your tasks definition by adding types.


'inputs' field of Task should be a dictionary of field names and their types. For example, {'text': 'str', 'classes': 'List[str]'}. Instead only '['question', 'question_id', 'topic']' was passed. All types will be assumed to be 'Any'. In future version of unitxt this will raise an exception.
'outputs' field of Task should be a dictionary of field names and their types. For example, {'text': 'str', 'classes': 'List[str]'}. Instead only '['reference_answers', 'reference_contexts', 'reference_context_ids', 'is_answerable_label']' was passed. All types will be assumed to be 'Any'. In future version of unitxt this will raise an exception.


Special thanks to pawelknes who implemented this important feature. It truly demonstrates the collective power of the Unitxt community and the invaluable contributions made by Unitxt users beyond the core development team. Such contributions are highly appreciated and encouraged.

* For more detailed information, please refer to https://github.com/IBM/unitxt/pull/710

Breaking Changes

"metrics.spearman", "metrics.kendalltau_b", "metrics.roc_auc": prediction type is float.
"metrics.f1_binary","metrics.accuracy_binary", "metrics.precision_binary", "metrics.recall_binary", "metrics.max_f1_binary", "metrics.max_accuracy_binary": prediction type is Union[float, int], references must be equal to 0 or 1

Bug Fixes
* Set empty list if preprocess_steps is None by marukaz in https://github.com/IBM/unitxt/pull/780
* Fix UI load failure due to typo by yoavkatz in https://github.com/IBM/unitxt/pull/785
* Fix huggingface uploads by elronbandel in https://github.com/IBM/unitxt/pull/793
* Fix typo in error message by marukaz in https://github.com/IBM/unitxt/pull/777

New Assets
* add perplexity with Mistral model by lilacheden in https://github.com/IBM/unitxt/pull/713

New Features
* Type checking for task definition by pawelknes in https://github.com/IBM/unitxt/pull/710
* Add open and ibm_genai to llm as judge inference engine by OfirArviv in https://github.com/IBM/unitxt/pull/782
* Add negative class score for binary precision, recall, f1 and max f1 by lilacheden in https://github.com/IBM/unitxt/pull/788
1. Add negative class score for binary precision, recall, f1 and max f1, e.g. f1_binary now returns also "f1_binary_neg".
2. Support Unions in metric prediction_type
3. Add processor cast_to_float_return_nan_if_failed
4. Breaking change: Make prediction_type of metrics numeric:
A. "metrics.kendalltau_b", "metrics.roc_auc": prediction type is float.
B. "metrics.f1_binary","metrics.accuracy_binary", "metrics.precision_binary", "metrics.recall_binary", "metrics.max_f1_binary", "metrics.max_accuracy_binary": prediction type is Union[float, int], references must be equal to 0 or 1
* Group shuffle by sam-data-guy-iam in https://github.com/IBM/unitxt/pull/639

Documentation
* Fix a small typo by dafnapension in https://github.com/IBM/unitxt/pull/779
* Update instructions to install HELM from PyPI by yifanmai in https://github.com/IBM/unitxt/pull/783
* Update few-shot instructions in Unitxt with HELM by yifanmai in https://github.com/IBM/unitxt/pull/774


**Full Changelog**: https://github.com/IBM/unitxt/compare/1.7.7...1.8.0

**Full Changelog**: https://github.com/IBM/unitxt/compare/1.8.1...1.8.0

Page 3 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.