Onnxruntime-genai

Latest version: v0.7.0

Safety actively analyzes 722491 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 2

0.6.0

Release Notes

We are excited to announce the release of `onnxruntime-genai` version 0.6.0. Below are the key updates included in this release:

1. Support for contextual or continuous decoding allows users to carry out multi-turn conversation style generation.
2. Support for new models such as Deepseek R1, AMD OLMo, IBM Granite and others.
3. Python 3.13 wheels have been introduced
4. Support for generation for models sourced from [Qualcomm's AI Hub](https://aihub.qualcomm.com/mobile/models). This work also includes publishing a nuget package `Microsoft.ML.OnnxRuntimeGenAI.QNN` for QNN EP
5. Support for WebGPU EP

This release also includes performance improvements to optimize memory usage and speed. In addition, there are several bug fixes that resolve issues reported by users.

0.5.2

Release Notes

Patch release 0.5.2 adds:

* Fixes for bugs 1074, 1092 via PRs 1065 and 1070
* Fix Nuget sample in package README to show correct disposal of objects
* Added extra validation via PRs 1050 1066

0.5.1

Release Notes

In addition to the features in the 0.5.0 release, this release adds:

* Add ability to choose provider and modify options at runtime
* Fixed data leakage bug with KV caches

0.5.0

Release Notes

* Support for MultiLoRA
* Support for multi-frame for Phi-3 vision and Phi-3.5 vision models
* Support for the Phi-3 MoE model
* Support for NVIDIA Nemotron model
* Support for the Qwen model
* Addition of the Set Terminate feature, which allows users to cancel mid-generation
* Soft capping support for Group Query Attention
* Extend quantization support to embedding and LM head layers
* Mac support in published packages

Known issues
* Models running with DirectML do not support batching
* Python 3.13 is not supported in this release

0.4.0

Release Notes
* Support for new models such as Qwen 2, LLaMA 3.1, Gemma 2, Phi-3 small on CPU
* Support to build already-quantized models that were quantized with AWQ or GPTQ
* Performance improvements for Intel and Arm CPU
* Packing and language binding
* Added Java bindings (build from source)
* Separate OnnxRuntime.dll and directml.dll out of GenAI package to improve usability
* Publish packages for Win Arm
* Support for Android (build from source)

0.3.0

Release Notes
* Phi-3 Vision model support for DML EP.
* Addressed DML memory leak issue and crashes on long prompts.
* Addressed crashes and slowness on CPU EP GQA on long prompts due to integer overflow issues.
* Added the import lib for windows C API package.
* Addressed a bug with `get_output('logits')` so that it returns the logits for the entire prompt and not for the last generated token.
* Addressed a bug with querying the device type of the model so that it won't crash.
* Added NetStandard 2.0 compatibility.

Page 1 of 2

Releases

Has known vulnerabilities

Onnxruntime-genai

Page 1 of 2

0.6.0

0.5.2

0.5.1

0.5.0

0.4.0

0.3.0

Page 1 of 2

Links

Releases