Ai-edge-torch

Latest version: v0.4.0

Safety actively analyzes 720896 Python packages for vulnerabilities to keep your Python projects secure.

0.4.0

0.2.0

* Python versions: 3.9, 3.10, 3.11
* Operating system: Linux
* PyTorch: 2.4.0
* TensorFlow: tf-nightly>=2.18.0.dev20240722

See [this section](https://github.com/google-ai-edge/ai-edge-torch/tree/v0.2.0?tab=readme-ov-file#installation) of the README

PyTorch Converter

Compatible with torch 2.4.0 stable release. `pip install ai-edge-torch(-nightly)` is now the only command needed to install ai-edge-torch and all dependencies.

Features
* Added `ai_edge_torch.to_channel_last_io` API ([doc](https://github.com/google-ai-edge/ai-edge-torch/blob/v0.2.0/docs/pytorch_converter/README.md#convert-model-with-nhwc-channel-last-inputsoutputs))
* Added `ai_edge_torch.debug._search_model` API

Performance Improvements
* Improved layout optimization algorithm and general model performance
* Improved performance for `torch.nn.function.interpolate` with nearest mode
* Improved performance for `aten.gelu`
* Improved performance for `aten.avg_pool2d` with `ceil_mode=True`
* Reduced conversion memory usage in torch_xla and MLIR converter

Bug Fix
* Fixed numerical/precision issue with `aten.native_group_norm` (`nn.GroupNorm`)

Generative API

Authoring API
* Implemented [new layer components](https://github.com/google-ai-edge/ai-edge-torch/tree/v0.2.0/ai_edge_torch/generative/layers/unet) for diffusion-based models

Support for new models
* [Stable Diffusion](https://github.com/google-ai-edge/ai-edge-torch/tree/v0.2.0/ai_edge_torch/generative/examples/stable_diffusion) 1.5 support CPU

Quantization
* Enabled [selective quantization](https://github.com/google-ai-edge/ai-edge-torch/blob/v0.2.0/ai_edge_torch/generative/quantize/README.md#advanced-usage) for different Generative layers for LLMs
* Enabled [weight-only quantization](https://github.com/google-ai-edge/ai-edge-torch/blob/v0.2.0/ai_edge_torch/generative/quantize/quant_recipes.py#L43) with computation in floating point for increased accuracy
* Added quantization support for [embedding tables](https://github.com/google-ai-edge/ai-edge-torch/blob/v0.2.0/ai_edge_torch/generative/quantize/quant_recipe.py#L103)

Documentation
* Added [system architecture overview](https://github.com/google-ai-edge/ai-edge-torch/blob/v0.2.0/ai_edge_torch/generative/doc/system_overview.md) for Torch generative API

0.1.1

* Python versions: 3.9, 3.10, 3.11
* Operating system: Linux
* PyTorch: 2.4.0.dev20240429
* TensorFlow: 2.17.0.dev20240509

See [this section](https://github.com/google-ai-edge/ai-edge-torch/tree/v0.1.1?tab=readme-ov-file#installation) of the README

PyTorch Converter (Beta)

Functionality
First release of a direct path from PyTorch to the TFLite runtime ([blog post](https://developers.googleblog.com/en/ai-edge-torch-high-performance-inference-of-pytorch-models-on-mobile-devices/)).

Coverage
* Verified successful conversion of Pytorch to TFLite on a Beta test set of 72 Pytorch models readily available from [torchvision](https://pytorch.org/vision/0.9/models.html), [torchaudio](https://pytorch.org/audio/stable/models.html), [timm](https://github.com/huggingface/pytorch-image-models?tab=readme-ov-file#models), [HuggingFace transformers](https://github.com/huggingface/transformers/), and open source GitHub repositories (such as [Yolox](https://github.com/Megvii-BaseDetection/YOLOX/tree/main), [U2Net](https://github.com/xuebinqin/U-2-Net/tree/master), [IS-Net](https://github.com/xuebinqin/DIS)) spanning computer vision, text, audio, and speech applications.

Performance
* Excellent CPU performance for the converted models, leveraging the TFLite XNNPACK delegate.
* A subset of the Beta test set can be fully delegated to GPU, others are partially delegated or unsupported.
* QNN delegate ([available here](https://softwarecenter.qualcomm.com/api/download/software/qualcomm_neural_processing_sdk/v2.22.0.240425.zip)) supports most models in the Beta test set with significant average acceleration relative to CPU (20X) and GPU (5X) using Qualcomm’s DSP and neural processing units.

Quantization
* Support for dynamic quantization with [PT2E](https://pytorch.org/tutorials/prototype/quantization_in_pytorch_2_0_export_tutorial.html).
* Support for [post-training quantization](https://www.tensorflow.org/lite/performance/post_training_quantization) via the TFLite converter.
* AI Edge Torch Converter APIs to use these two quantization frameworks are [here](https://github.com/google-ai-edge/ai-edge-torch/blob/v0.1.1/docs/pytorch_converter/README.md#quantization).

Known Issues
* Inference latency with quantized models is higher than unquantized models in some cases.

Generative API (Alpha)

Functionality
* Provides PyTorch native [building blocks](https://github.com/google-ai-edge/ai-edge-torch/tree/v0.1.1/ai_edge_torch/generative/layers) to compose LLMs using mobile friendly abstractions for performant execution on TFLite runtime.
* Examples to author LLMs via Edge Generative API for conversion to TFLite for Gemma, TinyLlama and Phi-2. ([Examples](https://github.com/google-ai-edge/ai-edge-torch/tree/v0.1.1/ai_edge_torch/generative/examples))
* Supports 8-bit dynamic range quantization. ([here](https://github.com/google-ai-edge/ai-edge-torch/tree/v0.1.1/ai_edge_torch/generative/quantize))
* Integration with [MediaPipe LLM Inference API](https://github.com/google-ai-edge/ai-edge-torch/tree/v0.1.1/ai_edge_torch/generative#use-mediapipe-llm-inference-api) for easy integration into Mobile Apps, and a prompt interface.

Known Issues
* The conversion, and serialization process is unoptimized for LLMs. It requires keeping multiple copies of the weights in memory for transformations, and serialization/deserialization.
* Runtime execution of the LLM in TFLite is missing some memory optimizations, and inefficient during memory unpacking on XNNPack.

Releases

Has known vulnerabilities