Executorch

Latest version: v0.5.0

Safety actively analyzes 722631 Python packages for vulnerabilities to keep your Python projects secure.

0.5.0

The 0.5 release of ExecuTorch accompanies the release of PyTorch 2.6, and includes various updates and improvements to ExecuTorch’s backend delegates, as well as slight improvements to the Python and C++ APIs. Most notably, dim order has been enabled in ExecuTorch export by default.

On the front of Llama model support, an [eager runner](https://github.com/pytorch/executorch/blob/main/examples/models/llama/runner/eager.py) has been added to the Llama example to allow running inference in eager mode; additionally, support for [AttentionSink](https://arxiv.org/abs/2309.17453) has been added for eager mode execution.

API Changes

* Introduced a C++ `TensorAccessor` class for ExecuTorch tensors based on PyTorch’s [`TensorAccessor`](https://github.com/pytorch/pytorch/blob/release/2.6/aten/src/ATen/core/TensorAccessor.h) class
* Introduced a Python `save(path: str)` to `ExecutorchProgramManager` to reduce boilerplate code required to serialize to a `.pte` file
* Introduced the C++ `PlatformMemoryAllocator` class to allow kernel authors to provide their own memory allocation implementation
* Introduced the C++ `num_instructions()` function to the C++ `Method` class
* Enabled direct serialization of `uint16 types` in ExecuTorch programs

Build

* ExecuTorch nightly binaries are now built only for python `3.10`, `3.11` and `3.12`
* Introduced nightly builds for Apple platforms, which can be found listed [here](https://ossci-ios.s3.amazonaws.com/list.html)
* Added support for NumPy 2

Backends

Arm

* Added support for the following operators:
1D convolution, Tanh activation, `select`, 2D max pooling, `upsample_nearest2d`, `cat`/`stack`, `rshift`, `concat`, `log_softmax`, `var`, `layer_norm`
* Improved support of reduction operators
* Extended softmax to handle dim < 0
* Added support for `keep_dims == True` for `mean` and `var` operators
* Enabled reporting of Ethos-U PMU hardware counters in the Arm delegate executor
* Multiple TOSA Spec support
* Adding model evaluation functionality to the AOT Compiler

Cadence

* Migrated most of the graph level compiler from internal Meta location to OSS location
* Cadence OSS flow is now using ~50 graph-level optimization passes
* Various improvements to the export workflow for Cadence chips
* Expanded operator support to include 33 ATen operators and 11 quantized operators
* Integrated multiple optimized kernels for HiFi and Fusion chips, resulting in large performance gains (double digit percent to orders of magnitude)
* Enabled `mobilenet_v2` and `resnet50` as e2e tests

CoreML

* Added the option to specify which CoreML compute unit to use in the Llama model export script
* Fixed a compilation crash on iOS <16
* Added support for dim order

Qualcomm

* Enabled batch prefill for llama with weight sharing feature
* Various improvements to Llama model support for both prefill and decode, including sha, static_llama (kv cache as io), graph break reduction, and more
* Added example for the `wav2letter` model
* Added support for the `retinanet_fpn` model
* Added support for the SA8295 SoC
* Added support for QAT
* Added support dim order
* Added `DrawGraph` utility for graph visualization

MediaTek

* Integrated the MediaTek backend in the Android Llama application
* Added support for dim order

MPS

* Added support for dim order

Vulkan

* Improved support for Llama model architectures in the Vulkan backend:
* Added implementation of SDPA + KV cache updated fused operator
* Added implementation of rotary embeddings
* Various improvements to compute shader latency and memory footprint, such as:
* Introduced support for push constants in compute shaders, used to pass in tensor metadata (i.e. sizes)
* Switched from `VK_IMAGE_TILING_OPTIMAL` to `VK_IMAGE_TILING_LINEAR` as the default texture tiling setting which greatly reduces memory footprint of image textures used to store tensors
* Reduced register pressure in compute shaders by using lower precision integer types to store texture positions and tensor indices
* Added export pass to automatically insert transition ops to switch to a different optimal/required storage types or memory layout between operators in the export graph

XNNPACK

* Updated XNNPACK Version to commit hash `1ed874e65` which includes the newest KleidiAI Blockwise Kernels which gives around 20% performance improvement on Llama Prefill.
* Support for delegating models quantized via `torchao`’s `quantize_` api
* New Partitioner XNNPACK Partitioner, with configurable settings that allow users greater control over how ops are partitioned
* Support for `to_edge_transform_and_lower`, leveraging this API with the partitioner provides more stable lowerings
* Allowed `addmm` and `mm` to call dynamic fp32 kernels
* Fixes to partitioning of unsupported operators
* Update `cpuinfo` dependency to resolve intermittent faults on UNISOC-based phones

Devtools

* Added a [public benchmark dashboard](https://hud.pytorch.org/benchmark/llms?repoName=pytorch%2Fexecutorch), offering insights into ExecuTorch model performance trends, commit-to-commit comparisons, and anomaly detection. Onboarded Llama3.2-1B to track perf with SpinQuant, QLora, CoreML ANE.
* Added support for `uint16` in the devtools inspector

Llama Model Support

* Swapped TorchTune attention with custom export-friendly ExecuTorch attention
* Added `llama3_2_vision` text decoder as a TorchTune exportable model
* Added a [React Native LLaMA](https://github.com/pytorch/executorch/tree/release/0.5/examples/demo-apps/react-native/rnllama) app for iOS devices
* Added support for the `bfloat16` dtype in the LLM runner binary and the `export_llama` script
* Added support for [AttentionSink](https://arxiv.org/abs/2309.17453) in the Llama example
* Added TorchAO MPS low bit operators to the Llama runner
* Added support for kv cache quantization; currently only 8-bit per token quantization is supported with FP32 as a dequantized dtype. This can be enabled in the `export_llama` script using the `–quanitze_kv_cache` option.
* Added support for quantized versions of Llama 3.2 1B/3B

Kernel Libraries

* Implemented several portable operators: `pixel_unshuffle`, `gather`, `topk`, `convolution_backward`, `narrow_copy`, `masked_select`, `max.unary_out`, `min.unary_out`, `scatter.src_out`, `scatter.value_out`. `repeat_interleave.Tensor_out`
* Implemented `tile_crop` custom operator
* Implemented scalar `trunc` primitive operator
* Implemented BFloat16 support, focusing on LLM operator coverage (`op_to_copy`, `op_mul`, `op_mm`, `op_copy`, `op_slice_scatter`, `op_scalar_tensor`, `op_where`, `op_add`, CPUBLAS gemm).
* Fixed handling of rank 0 tensors in optimized `add`/`sub`/`div`/`mul`
* Fixed `_native_batch_norm_legit_no_stats_out`

First Time Contributors

Thanks to the following contributors for making their first commit for this release!

navsud, meyering, tugsbayasgalan, Abhishek8394, RahulK4102, RdoubleA, varunchariArm, laithsakka, limintang, veselinp, MaggieMoss, azad-meta, anyj0527, jainapurva, suchir1, ru-m8, wdvr, anijain2305, tianxf99, sxu, f-meloni, Vysarat, georgehong, lg-zhang, h-friederich, AIWintermuteAI, itisgrisha, ykhrustalev, hietalajulius, Nick-Wei, Abhi-hpp, KapJI, YIWENX14, clee2000, Michiel-Olieslagers, karthik-manju, jakmro, Aleksei-grovety,

**Full Changelog**: https://github.com/pytorch/executorch/compare/v0.4.0...v0.5.0

0.4.0

We're excited to announce the Beta release of ExecuTorch! This release includes many new features, improvements, and bug fixes.

API Stability and Runtime Compatibility Guarantees
Starting with this release, ExecuTorch's Python and C++ APIs will follow the [API Lifecycle and Deprecation Policy](https://pytorch.org/executorch/0.4/api-life-cycle.html), and the `.pte` file format will comply with the [Runtime Compatibility Policy](https://github.com/pytorch/executorch/blob/release/0.4/runtime/COMPATIBILITY.md).

New Features
- Introduced `exir.to_edge_transform_and_lower` API for combining the functionality of `to_edge`, `transform`, and `to_backend`
- Allows users to prevent specific op decompositions while lowering to backends that implement those ops
- Increased operator coverage for ExecuTorch’s portable library
- Added new experimental APIs:
- LLM runner C++ APIs such as `prefill_image()`, `prefill_prompt()`, and `generate_from_pos()` with multimodal support
- `executorch.runtime` python module for loading `.pte` files and running them with the underlying C++ runtime
- Added a new [Tensor API](https://pytorch.org/executorch/0.4/extension-module.html) to bundle the dynamic data and metadata within a Tensor object.
- Improved the [Module API](https://pytorch.org/executorch/0.4/extension-module.html) to share an ExecuTorch Program between several Modules and provide APIs to set inputs/outputs before execution
- Added `find_package(executorch)` for projects to easily link to ExecuTorch’s prebuilt library in CMake
- Introduced reproducible [benchmarking infrastructure](https://github.com/pytorch/executorch/blob/release/0.4/extension/benchmark/README.md?plain=1) to measure, debug, and track performance, enabling on-demand and automated nightly benchmarking of models and backend delegates on modern smartphones
- New benchmarking apps for Apple platforms to measure model performance on [iOS/macOS](https://github.com/pytorch/executorch/blob/release/0.4/extension/apple/Benchmark/README.md) and [Android](https://github.com/pytorch/executorch/blob/release/0.4/extension/android/benchmark/README.md)
- Added support for TikToken v5 vision tokenizer
- Improved parallelization for LLM prefill
- Added experimental capabilities for on-device training, along with an [example prototype](https://github.com/pytorch/executorch/pull/5233/) for LLM finetuning

Supported Models
- Added support for the following models:
- LLaMA 3 models, including LLaMA 3 8B, 3.1 8B, and 3.2 1B/3B
- [MultiModal] LLaVA (Large Language and Vision Assistant)
- Phi-3-mini
- Gemma 2B
- Added LLaMA 3, 3.1, and 3.2 to the [Android Llama Demo app](https://github.com/pytorch/executorch/blob/release/0.4/examples/demo-apps/android/LlamaDemo/README.md?plain=1)
- Added LLaVa multimodal support to the iOS iLLaMA and Android LLaMa Demo apps

Hardware Acceleration
- Delegate framework
- Allow delegate to [consume buffer mutations](https://github.com/pytorch/executorch/pull/4830)
- **[New]** MediaTek
- Added support for a new MediaTek backend
- Enabled LLaMa 3 acceleration on MediaTek’s NPU
- Added export scripts and runners for [8 different OSS models](https://github.com/pytorch/executorch/tree/main/examples/mediatek/model_export_scripts)
- Implemented intermediate tensor logging
- CoreML
- Added LLaMA support for in-place KV cache, fused SDPA kernel, and 4-bit per-block quantization
- Added primitive support for dynamic shapes to work without `torch._check`
- Expanded operator coverage to over 100 ops
- Enabled stateful runtime execution
- Implemented Intermediate tensor logging
- MPS
- Added support for 4-bit linear kernels (iOS 18 only)
- Enabled LLaMa 2 7B and LLaMa 3 8B
- Qualcomm (Qualcomm Neural Network)
- Enabled LLaMa 3 8B with 4-bit linear kernel, SpinQuant, fused RMSNorm from QNN 2.25, and model sharding
- Added support for the AI Hub model format
- Implemented Intermediate tensor logging
- ARM
- Added new operators
- `add`, `addmm`, `avg_pool2d`, `batch_norm`, `bmm`, `clone/cat`, `conv2d` improvements, `div`, `ecp`, `full`, `hardtanh`, `logaddm`, `mean_dim`, `mul`, `permute`, `relu`, `sigmoid`, `slice`, `softmax`, `sub`, `unsqueeze`, `view`
- Added/enabled lowering passes to improve network compatibility
- Improved quantization support
- Made quantization accuracy improvements for all models
- Added quantization coverage for all available ops
- Improved channel last support by reducing overhead and number of conversions
- Added performance measurements on Corstone-300 FVP for Ethos-U55
- Moved to new compilation flow in Vela to provide better performance and compatibility
- Improved code documentation for third party contributors
- XNNPACK
- Enhanced XNNPACK backend performance
- Added support for new LLaMa models and other quantized LLMs on Android/iOS devices, including LLaMA 3 8B, 3.1 8B, and 3.2 1B/3B
- Introduced major partitioner refactor to improve UX and stability
- Improved model [coverage](https://github.com/pytorch/executorch/blob/release/0.4/examples/xnnpack/__init__.py#L16) to ensure better stability
- Vulkan
- Made latency optimizations for Vulkan convolution and matrix multiplication compute shaders through various algorithmic improvements
- Added quantizer for 8 bit weight-only quantization
- Expanded operator coverage to 63 ops
- Added 4-bit and 8-bit weight quantized linear kernels
- Added support for view tensors in the Vulkan graph runtime, allowing for no-copy permutes, squeeze/unsqueeze etc.
- Added support for symbolic integers in the Vulkan graph runtime
- Integration with ExecuTorch SDK to track compute shader latencies
- Cadence
- Added an x86 executor to sanity check and numerically verify models locally
- Added multiple supported e2e models such as wav2vec2
- Integrated low-level optimizations resulting in 10x+ performance improvements
- Migrated more graph-level optimizations to the open source repository
- Enabled more types in the CadenceQuantizer, and moved to int8 default for better performance

Developer Experience
- Introduced API to enable intermediate output logging in delegates
- Improved CMake build system and reduced reliance on Buck2
- Added override options for fallback PAL implementations through CMake flag (`-DEXECUTORCH_PAL_DEFAULT`)
- Changes to DimOrder (please see [this issue](https://github.com/pytorch/executorch/issues/6330) for current progress and next steps)

Bug Fixes
- Fixed various issues related to quantization, tensor operations, and backend integrations
- Resolved memory allocation and management issues
- Fixed compatibility issues with different Python and dependency versions
- Fixed [bundled program and plan_execute in pybindings](https://github.com/pytorch/executorch/pull/4595)

Breaking Changes
- Updated the minimum C++ version to C++17 for the core runtime
- Removed all C++ headers under `//executorch/util` (see `extension/runner_util/inputs.h` for a `PrepareInputTensors` replacement)
- Users are expected now to provide their own `read_file.h` functionality
- Renamed instances of `sdk` to `devtools` for file names, function names, and CMake options

Deprecation
- Added new annotations and decorators for API lifecycle and deprecation management
- New `ET_EXPERIMENTAL` annotation indicates C++ APIs that may change without notice
- New `deprecated` and `experimental` python decorators indicate non-stable APIs
- Names under the `torch::` namespace are deprecated in favor of names under the `executorch::` namespace, please migrate code to use the new namespace and avoid adding new references to the `torch::` namespace
- Constant buffers are no longer stored inside the `.pte` flatbuffer and are stored in a segment attached to the `.pte` moving forward
- All C++ macros beginning with underscores such as `__ET_UNUSED` are deprecated in favor of unprefixed names such as `ET_UNUSED`
- `capture_pre_autograd_graph()` is deprecated in lieu of the new `torch.export_for_training()` API

Thanks to the following open source contributors for their work on this release!
[denisVieriu97](https://github.com/denisVieriu97), [Erik-Lundell](https://github.com/Erik-Lundell), [Esteb37](https://github.com/Esteb37), [SaoirseARM](https://github.com/SaoirseARM), [benkli01](https://github.com/benkli01), [bigfootjon](https://github.com/bigfootjon), [chuntl](https://github.com/chuntl), [cymbalrush](https://github.com/cymbalrush), [derekxu](https://github.com/derekxu), [dulinriley](https://github.com/dulinriley), [freddan80](https://github.com/freddan80), [haowhsu-quic](https://github.com/haowhsu-quic), [namanahuja](https://github.com/namanahuja), [neuropilot-captain](https://github.com/neuropilot-captain), [oscarandersson8218](https://github.com/oscarandersson8218), [per](https://github.com/per), [python3kgae](https://github.com/python3kgae), [r-barnes](https://github.com/r-barnes), [robell](https://github.com/robell), [salykova](https://github.com/salykova), [shewu-quic](https://github.com/shewu-quic), [tom-arm](https://github.com/tom-arm), [winskuo-quic](https://github.com/winskuo-quic), [zingo](https://github.com/zingo)

**Full Changelog**: https://github.com/pytorch/executorch/compare/v0.3.0...v0.4.0

0.3.0

Key Updates
* Continued feature work and improvements on operator kernels and backends, including Apple (Core ML, MPS), Arm, Cadence, Qualcomm, Vulkan, XNNPACK.
* Various improvements to the CMake system and the Android and iOS artifacts build.
* Various improvements on the AOT export path; e.g., eliminating view_copy, and adding Llama quantizer.
* Introduce [dim order](https://pytorch.org/executorch/stable/concepts.html#dim-order) for ExecuTorch. With dim order, tensors within a single graph can support multiple memory formats.
* Introduce [new API](https://github.com/pytorch/executorch/pull/2937) to register custom ExecuTorch kernels into ATen.
* Binary size reductions to the portable library via [compile-time optimizations](https://github.com/pytorch/executorch/pull/3534).
* Consolidate [tokenizer interface](https://github.com/pytorch/executorch/blob/727b0ca139ae6ac12f85e8e386bbafa3448b0d33/examples/models/llama2/tokenizer/tokenizer.h) for LLM models.
* Add a [colab notebook](https://colab.research.google.com/drive/1SUgO4pNhGJFZValD93RP6CIvr0yGJGg4?fbclid=IwZXh0bgNhZW0CMTEAAR3s6QzJWvR8Oz1_acB-PTlDUgzdy91xpYQ9Oy3Wh3faF_IozeNugjO51qA_aem_Mj8tEtyGWSvVr_TNwWa_2Q#scrollTo=0TwvkR0L4zT5) to show Llama E2E flow in ExecuTorch.
* Improved C++ test and [Pytest](https://github.com/pytorch/executorch/blob/main/pytest.ini) coverage.
* Deprecate exir.capture in favor of torch.export.
* Update versions for flatbuffers (v24.3.25), flatcc (896db54) and coremltools (8.0b1).

Kudos to the following first time contributors
Andres Suarez, Ben Rogers, Carlos Fernandez, Catherine Lee, Chakri Uddaraju, Chris Hopman, Chris Thompson, David Lin, Di Xu, Edward Yang, Eric J Nguyen, Erik Lundell, Hardik Sharma, Ignacio Guridi, Jakob Degen, Kaichen Liu, Lunwen He, Masahiro Hiramori, Naman Ahuja, Nathanael See, Nikita Shulga, Richard Zou, Sicheng Jia, Stephen Bochinski, Val Tarasyuk, Will Li, Yanghan Wang, Yipu Miao, Yujie Hui, Yupeng Zhang, Zingo Andersen, Zonglin Peng, salykova

Full Changelog
Please see https://github.com/pytorch/executorch/compare/v0.2.1-rc5...v0.3.0-rc6 for all 735 commits since the previous release.

0.2.1

This release is meant to fix the following documentations and bugs.

PyPi package can be downloaded via https://pypi.org/project/executorch/0.2.1/

Docs
* Fix the broken cmake commands in sdk integration tutorial (https://github.com/pytorch/executorch/pull/3432)
* Simplify SDK tutorial by moving cmake commands to a script (https://github.com/pytorch/executorch/pull/3492)
* Fix Docs on how to perform cross compilation for Android and iOS (https://github.com/pytorch/executorch/pull/3722)
* Fix top-level documentation ordering (https://github.com/pytorch/executorch/pull/3738)
* Change docs to use CMake instead of Buck (https://github.com/pytorch/executorch/pull/3778)
* Improve Docs for fresh M1 Setup (https://github.com/pytorch/executorch/pull/3791)
* Update README.md to double check python environment (https://github.com/pytorch/executorch/pull/3806)
* Improve Docs on Module Extension (https://github.com/pytorch/executorch/pull/3807)
* Update top level README.md (https://github.com/pytorch/executorch/pull/3817)
* Add documentation for Android prebuilt workflow (https://github.com/pytorch/executorch/pull/3841)
* Update Quant Overview Documentation (https://github.com/pytorch/executorch/pull/3857)
* Update docs for when flatbuffer is not found (https://github.com/pytorch/executorch/pull/3862)
* Update Llama README for Llama3 (https://github.com/pytorch/executorch/pull/3871)
* Fix docs for tokenizer.model in Llama2 Readme (https://github.com/pytorch/executorch/pull/3881)
* Add colab/jupyter notebook in getting started page (https://github.com/pytorch/executorch/pull/3885)

Bug Fixes
* Fix the sdk_example_runner.sh script (https://github.com/pytorch/executorch/pull/3431)
* Fix op_split_with_sizes_copy to support dynamic shape (https://github.com/pytorch/executorch/pull/3175)
* Fix the temp allocator for backend (https://github.com/pytorch/executorch/pull/3506)
* Fix Buck 2 Error on running ./install_requirements.sh (https://github.com/pytorch/executorch/pull/3512)
* Fix a .pte export issue in some environments (https://github.com/pytorch/executorch/pull/3813)
* Fix python dispatcher so that expand_copy and view_copy will go through the correct meta kernels (https://github.com/pytorch/executorch/pull/3809)
* Fix bug in examples/demos to build prebuilt android package (https://github.com/pytorch/executorch/pull/3820)
* Update Ethos-u software to version 24.05 (https://github.com/pytorch/executorch/pull/3852)

Release tracker https://github.com/pytorch/executorch/issues/3409 contains all relevant pull requests related to this release as well as links to related issues.

0.2.0

**Full Changelog**: https://github.com/pytorch/executorch/compare/v0.1.0...v0.2.0

Foundational Improvements

Large generative AI model support

* Support generative AI models like Meta Llama 3 8B and Llama 2 7B on Android and iOS phones
* 4-bit group-wise weight quantization
* XNNPACK Delegate and kernels for best performance on CPU (WIP on other backends)
* KV Cache support through PyTorch mutable buffer
* Custom ops for SDPA, with kv cache and multi-query attention
* ExecuTorch Runtime + tokenizer and sampler

Core ExecuTorch improvements

* Simplified setup experience
* Support for PyTorch mutable buffers
* Support for multi-gigabyte models
* Constant data moved to its own .pte segment for more efficient serialization
* Better kernel coverage in portable lib, XNNPACK, ARM, CoreML, MPS and HTP delegates.
* SDK - better profiling and debugging within delegates
* API improvements/simplification
* Dozens of fixes to fuzzer-identified .pte file-parsing issues
* Vulkan delegate for mobile GPU
* Data-type based selective build for optimizing binary size
* Compatibility with torchtune
* More models supported across different backends
* Python code now available as the "executorch" pip package in PyPI

Hardware Acceleration Improvements

Arm

* Significant boost in operator test coverage thought the use of TOSA reference model, as well as improved CI coverage
* Added support for quantization with the ArmQuantizer
* Added support for MobileNet v2 TOSA generation
* Working towards MobileNet v2 execution on Ethos-U
* Added support for multiple new operators on Ethos-U compiler
* Added NCHW/NHWC conversion for Ethos-U targets until NHWC is supported by ExecuTorch
* Arm backend example now works on MacOS

Apple Core ML

* [SDK] ExecuTorch SDK Integration for better debugging and profiling experience
* [SDK] ExecuTorch SDK integration using the new MLComputePlan API released in iOS 17.4 and macOS 14.4
* [SDK] A model lowered to the CoreML backend can be profiled using the ExecuTorch Inspector without additional setup
* [SDK] Profiling surfaces Core ML specific information for each operation in the model, including: supported compute devices, preferred compute device, and estimated cost for each compute device.
* [SDK] The Core ML delegate backend also supports logging intermediate tensors for model debugging.
* [Partitioner] Enables a developer to lower a model even if Core ML doesn’t support all the operations in the model.
* [Partitioner] A developer will now be able to specify the operations that should be skipped by the Core ML backend when lowering the model.
* [Quantizer] Leverages PyTorch 2.0 export-based quantization APIs.
* [Quantizer] Encodes specific quantization rules in order to optimize the model for execution on Apple silicon
* [Quantizer] Integrated with ExecuTorch Core ML delegate conversion pipeline

Apple MPS

* Support for over 100 ops (parity with PyTorch MPS backend supported ops)
* Support for iOS/iPadOS>=14.4+ / macOS>=12.4
* Support for MPSPartitioner
* Support for following dtypes: fp16, fp32, bfloat16, int8, int16, int32, int64, uint8, bool
* Support for profiling (etrecord, etdump) through Inspector API
* Full unit testing coverage for AOT and runtime for all supported operators
* Enabled storiesllama (floating point) on MPS

Qualcomm

* Support for Snapdragon 8 Gen 3 is added.
* Enabled on-device compilation. (aka QNN online-prepare)
* Enabled 4-bit and 16-bit quantization.
* Qualcomm AI Studio QNN Profiling is integrated into ExecuTorch flow.
* Enabled storiesllama on HTP-fp16 (but this effort is mainly thanks to Chen Lai from Meta being the main contributor for this)
* Added more operators support
* Additional models validated since v0.1.0:
* FbNet
* W2l (Wav2LetterModel)
* SSD300_VGG16
* ViT
* Quantized MobileBert (Quantized MobileBert contribution was submitted prior to v0.1.0 timeline, but merged afterwards)

Cadence HiFi

* Expanded operator support for Cadence HiFi targets
* Added first small model (RNNT-emformer predictor) to the Cadence HiFi examples

Model Support

Validated with one or more delegates

| | | |
| -------------------- | --------------- | ------------------ |
| Meta Llama 2 7B | LearningToPaint | resnet50 |
| Meta Llama 3 8B | lennard_jones | shufflenet_v2_x1_0 |
| Conformer | LSTM | squeezenet1_1 |
| dcgan | maml_omniglot | SqueezeSAM |
| Deeplab_v3 | mnasnet1_0 | timm_efficientnet |
| Edsr | Mobilebert | Torchvision_vit |
| Emformer_rnnt | Mobilenet_v2 | Wav2letter |
| functorch_dp_cifar10 | Mobilenet_v3 | Yolo v5 |
| Inception_v3 | phlippe_resnet | |
| Inception_v4 | resnet18 |

Tested with `torch.export` but not optimized for performance

| | | |
| ------------------- | ----------------- | ---------------- |
| Aquila 1 7B | GPT-2 | PLaMo 13B |
| Aquila 2 7B | GPT-J 6B | Qwen 1.5 7B |
| Baichuan 1 7B | InternLM2 7B | Refact |
| BioGPT | Koala | RWKV 5 world 1B5 |
| BLOOM 7B1 | MiniCPM 2B sft | Stable LM 2 1.6B |
| Chinese Alpaca 2 7B | Mistral 7B | Stable LM 3B |
| Chinese LLaMA 2 7B | Mixtral 8x7B MoE | Starcoder |
| CodeShell | Persimmon 8B chat | Starcoder 2 |
| Deepseek | Phi 1 | Vigogne (French) |
| GPT Neo 1.3B | Phi 1.5 | Yi 6B |
| GPT NeoX 20B | Phi 2 |

0.1.0

Initial public release of ExecuTorch. See https://pytorch.org/executorch for documentation.

**Important: This is a preview release**

This is a preview version of ExecuTorch and should be used for testing and evaluation purposes only. It is not yet recommended for use in production settings. We welcome any feedback, suggestions, and bug reports from the community to help us improve the technology. Please use the [PyTorch Forums](https://discuss.pytorch.org/) for discussion and feedback about ExecuTorch using the tag **#executorch**, and our [GitHub repository](https://github.com/pytorch/executorch/issues) for bug reporting.

stable-2023-09-19
**New models enabled (e2e tested via portable lib):**

* Emformer RNN-T Transcriber, Predictor, Joiner (as three modules)

**Quantization:**

* Enabled quantization for incpetion_v4 and deeplab_v3 in examples with XNNPACKQuantizer

**API changes:**

* **Runtime API**
* Many runtime APIs changed to improve ergonomics and to better match the style guide. Most of these changes are non-breaking (unless indicated as breaking), since the old APIs are available but marked as deprecated. We recommend that users migrate off of the deprecated APIs before the next release.
* For an example of how these API changes affected common use cases, see the edits made to ``examples/executor_runner/executor_runner.cpp`` under the "Files changed" tab of [https://github.com/pytorch/executorch/compare/stable-2023-09-12...78f884f2b9702a819907824c7ba76d76216dd901](https://github.com/pytorch/executorch/compare/stable-2023-09-12...78f884f2b9702a819907824c7ba76d76216dd901)
* Breaking behavioral change: ``MethodMeta``
* ``MethodMeta::num_non_const_buffers`` and ``MethodMeta::non_const_buffer_size`` no longer require adjusting by 1 to skip over the reserved zero index. This will require that users of ``MethodMeta`` remove adjustments while counting and iterating over non-const buffers.
* Details about the change, including migration to adapt to the new behavior: [https://github.com/pytorch/executorch/commit/5762802b47bca6ee9e85e50f21615554b815828c](https://github.com/pytorch/executorch/commit/5762802b47bca6ee9e85e50f21615554b815828c)
* Also note that these methods have been renamed to ``num_memory_planned_buffers`` and ``memory_planned_buffer_size`` (see note below)
* Note that the deprecated ``Program::num_non_const_buffers`` and ``Program::get_non_const_buffer_size`` methods did not change behavior re: skipping index zero. But they are deprecated, and will be removed in a future release, so we recommend that users migrate to the ``MethodMeta`` API and behavior.
* ``MethodMeta`` method names changed from `non_const to memory_planned`
* ``MethodMeta::num_non_const_buffers()`` is now ``MethodMeta::num_memory_planned_buffers()``
* ``MethodMeta::non_const_buffer_size(N)`` is now ``MethodMeta::memory_planned_buffer_size(N)``
* Changed in [https://github.com/pytorch/executorch/commit/6944c4556bcf22d4b51de9b3cc2876fe1b6ee91e](https://github.com/pytorch/executorch/commit/6944c4556bcf22d4b51de9b3cc2876fe1b6ee91e)
* The old names are available but deprecated, and will be removed in a future release
* Breaking code-compatibility change: ``Method``'s constructor and ``init()`` method are now private
* Users should not have used these methods; `Method` instances should only be created by ``Program::load_method()``
* Changed in [https://github.com/pytorch/executorch/commit/4f3e5e65d1881326b9729b8cbbadf0cb7739c649](https://github.com/pytorch/executorch/commit/4f3e5e65d1881326b9729b8cbbadf0cb7739c649)
* ``MemoryManager`` constructor no longer requires ``const_allocator`` or ``kernel_temporary_allocator``
* A new constructor lets users avoid creating zero-sized allocators that they don't use
* It also renames the parameters for the remaining allocators to make their uses more clear
* Changed in [https://github.com/pytorch/executorch/commit/6944c4556bcf22d4b51de9b3cc2876fe1b6ee91e](https://github.com/pytorch/executorch/commit/6944c4556bcf22d4b51de9b3cc2876fe1b6ee91e)
* Example migration to the new constructor: [https://github.com/pytorch/executorch/commit/fedc04c490ed41049a8303b7cf536cb3e41334b1](https://github.com/pytorch/executorch/commit/fedc04c490ed41049a8303b7cf536cb3e41334b1)
* The old constructor is available but deprecated, and will be removed in a future release
* Breaking code-compatibility change: ``MemoryManager`` is now final and cannot be subclassed
* Changed in [https://github.com/pytorch/executorch/commit/6944c4556bcf22d4b51de9b3cc2876fe1b6ee91e](https://github.com/pytorch/executorch/commit/6944c4556bcf22d4b51de9b3cc2876fe1b6ee91e)
* ``HierarchicalAllocator``'s constructor now takes an array of ``Span<uint8_t>`` instead of an array of ``MemoryAllocator``
* Changed in [https://github.com/pytorch/executorch/commit/58c8c924fc90a4128947ed23a5b46952624faf4f](https://github.com/pytorch/executorch/commit/58c8c924fc90a4128947ed23a5b46952624faf4f)
* Example migration to the new API: [https://github.com/pytorch/executorch/commit/0bce2cb0a6aec9b01f3b7d99373711f95a4d598c](https://github.com/pytorch/executorch/commit/0bce2cb0a6aec9b01f3b7d99373711f95a4d598c)
* The old constructor is still available but deprecated, and will be removed in a future release
* Breaking code-compatibility change: ``HierarchicalAllocator`` is now ``final`` and cannot be subclassed
* Changed in [https://github.com/pytorch/executorch/commit/58c8c924fc90a4128947ed23a5b46952624faf4f](https://github.com/pytorch/executorch/commit/58c8c924fc90a4128947ed23a5b46952624faf4f)
* ``Program::Load()`` renamed to ``Program::load()``
* Changed in [https://github.com/pytorch/executorch/commit/8a5f3e89c97fb59b60cccef3345a90e90ed2a474](https://github.com/pytorch/executorch/commit/8a5f3e89c97fb59b60cccef3345a90e90ed2a474)
* The old name is still available but deprecated, and will be removed in a future release
* ``FileDataLoader::From()`` renamed to ``FileDataLoader::from()``
* Changed in [https://github.com/pytorch/executorch/commit/e2dd0bec0e80ad6e1c97fad93012c60e30b7766d](https://github.com/pytorch/executorch/commit/e2dd0bec0e80ad6e1c97fad93012c60e30b7766d)
* The old name is still available but deprecated, and will be removed in a future release
* ``MmapDataLoader::From()`` renamed to ``MmapDataLoader::from()``
* Changed in [https://github.com/pytorch/executorch/commit/395e51acd56491255891677982686a3d14bdb8e4](https://github.com/pytorch/executorch/commit/395e51acd56491255891677982686a3d14bdb8e4)
* The old name is still available but deprecated, and will be removed in a future release
* **Delegate API**
* File rename: ``runtime/backend/backend_registry.cpp`` -> ``runtime/backend/interface.cpp``
* Partition API update: Partitioner.partition function takes `ExportedProgram` instead of `torch.nn.GraphModule`. With this change we access the parameters and buffer in partition function.
* How to rebase: access graphmodule by `exported_program.graph_module`
* **SDK**
* BundledProgram updates APIs to enable user bundling test cases on specific method by using method name instead of method id in the past
* AOT: class ``BundledConfig (method_names: List[str], inputs: List[List[Any]], expected_outputs: List[List[Any]])``. `method_names` is the new added attribute.
* Runtime: Replace the original method_idx with method_name
* API for load bundled test input to ET program:
``
__ET_NODISCARD Error LoadBundledInput(
Method& method,
serialized_bundled_program* bundled_program_ptr,
MemoryAllocator* memory_allocator,
const char* method_name,
size_t testset_idx);
``
* API for verify result with bundled expected output:
``
__ET_NODISCARD Error VerifyResultWithBundledExpectedOutput(
Method& method,
serialized_bundled_program* bundled_program_ptr,
MemoryAllocator* memory_allocator,
const char* method_name,
size_t testset_idx,
double rtol = 1e-5,
double atol = 1e-8);
``

* Details and examples can be found https://github.com/pytorch/executorch/blob/stable/docs/website/docs/tutorials/bundled_program.md

**Bug Fixes:**
* When exporting with enable_aot=True, all constant tensors will be lifted as inputs to the graph (in addition to the parameters and buffers).
* Kwargs are now consistently placed in the call_spec of the exported program.

stable-2023-09-12

**New models enabled (e2e tested via portable lib):**
* MobileBert

**Export API**

* Two stage export API
* We are in the process of moving away from `exir.capture()`: Please refer to this issue [https://github.com/pytorch/executorch/issues/290](https://github.com/pytorch/executorch/issues/290) for more details. Also look at the updated doc at [https://github.com/pytorch/executorch/blob/stable/docs/website/docs/tutorials/exporting_to_executorch.md](https://github.com/pytorch/executorch/blob/stable/docs/website/docs/tutorials/exporting_to_executorch.md)
* `exir.serialize`
* The `exir.serialize` module was renamed to `exir._serialize` and is now private
* `transform()`
* For perform passes on the same dialect, use [transform()](https://github.com/pytorch/executorch/blob/stable/exir/program/_program.py#L364-L369)

**Runtime API**

* Method
* Added `set_output_data_ptr()`, which is a simpler and safer way to set the output buffers if they were not memory-planned
* `Program::load_method()` now accepts an optional `EventTracer` parameter for non-global profiling and event data collection

**Delegation API**

* `backend.init()` and `backend.execute()` API changes.
* `BackendInitContext` is a new added argument for `backend.init` and `BackendExecutionContext` is the new added argument for `backend.execute()`.
* How to rebase on these apis changes?
* For backend.init, if `runtime_allocator` is not used, just mark context is not used with `__ET_UNUSED`. Otherwise, `runtime_allocator` can be accessed from the context.
* For backend.execute, nothing has been added to `context` yet, just mark it with `__ET_UNUSED` directly. We’ll add event tracer for profiling via `context` soon.
* `backend.preprocess()` API changes
* Updated backend.preprocess:
* `def preprocess( edge_program: ExportedProgram, compile_specs: List[CompileSpec], ) -> PreprocessResult`
* How to rebase on this API changes?
* Wrap the result like `PreprocessResult(processed_bytes=bytes)`
* Partitioner.partition API changes
* Updated Partition class definition. Move partition_tags from class attribute to be part of the `ParititionResult`.
* `def partition(self, graph_module: GraphModule) -> PartitionResult`
* How to rebase on this API change?
* Wrap both `partition_tags` and the `tagged_graph` together as `PartitionResult`
* Example Quantizer and Delegate e2e demo
* Added an [example](https://github.com/pytorch/executorch/blob/stable/backends/example/README.md) to show to add a quantizer and have it working with delegate to fully delegated a quantized MobileNetV2 model to the example backend.

**XnnpackDelegate**

* In an effort to align better with the rest of the Executorch AoT stack, XnnpackDelegate added preliminary support to also handle graphs exported with the canonical capture config (i.e. CaptureConfig.enable_aot=True and CaptureConfig._unlift=False)

**SDK**

* DelegateMappingBuilder to generate debug handle mapping AOT for delegates
* [https://github.com/pytorch/executorch/blob/stable/docs/website/docs/tutorials/profiling_and_debugging_delegates.md](https://github.com/pytorch/executorch/blob/stable/docs/website/docs/tutorials/profiling_and_debugging_delegates.md)
* BundledProgram enabled for usage with examples (more API changes to come in subsequent releases to improve usability, these will be breaking API changes)** **
* Documentation:
* [https://github.com/pytorch/executorch/blob/stable/docs/website/docs/tutorials/bundled_program.md](https://github.com/pytorch/executorch/blob/stable/docs/website/docs/tutorials/bundled_program.md)
* Example code pointers:
* [https://github.com/pytorch/executorch/tree/stable/examples/bundled_executor_runner](https://github.com/pytorch/executorch/tree/stable/examples/bundled_executor_runner)
* [https://github.com/pytorch/executorch/blob/stable/examples/export/export_bundled_program.py](https://github.com/pytorch/executorch/blob/stable/examples/export/export_bundled_program.py)

**Misc**

* [Linter](https://github.com/pytorch/executorch/blob/stable/CONTRIBUTING.md) enabled
* `pytest` enabled. Rerun `pip install .` to install `pytest` and other deps
* gtest enabled via buck, for example, run gtest for `runtime/core`
* `/tmp/buck2 test runtime/core/test/…`
* Index operator [rewrite](https://github.com/pytorch/executorch/pull/278):
* Fixed bug related to null indices.
* Implemented full [Numpy’s advanced indexing](https://numpy.org/doc/stable/user/basics.indexing.html#advanced-indexing) functionality (now it is possible to use multidimensional indices, and masks that only index a subspace).
* Build/CMake
* CMake release build mode with size optimization flags. We have an example in `examples/selective_build/test_selective_build.sh`

stable-2023-08-29
New models enabled (e2e tested via portable lib):
- Wav2Letter
- Inception V3 and Inception V4
- Resnet18 and Resnet50

Quantization:
- Enabled E2E MobileNet V2:
- Model can be quantized and run with portable + quantized op (for quantize/dequantize ops) lib.
Follow, https://github.com/pytorch/executorch/blob/main/examples/README.md#quantization, to run a quantized model via portable lib.
- MobileNet V3:
- Needs bumping up the pytorch nightly version (dev20230828) in order to enable MobileNet V3 quantization. However, this breaks ViT export, hence this cut will skip MobileNet V3 quantization until we resolve ViT export breakage.

Delegation:
- API update:
- [breaking changes] delegate AOT APIs are moved from `executorch/backends/` to `executorch/exir/backend`. To address the breakage: Update `from executorch.backends.backend_details` to `from executorch.exir.backend.backend_details`, and `from executorch.backends.backend_api` to `from executorch.exir.backend.backend_api`
- XNNPACK:
- XNNPACK delegated models can run on Mac/Linux in OSS
- XNNPACK lowering workflow examples have been added for MobileNet V2 (with quantization and delegation) and MobileNet V3 (with delegation)
- Showcase preliminary [XNNPACK perf stats](https://github.com/pytorch/executorch/tree/main/examples/backend#xnnpack-performance-gain) on Linux x86 & Mac M1

Selective build:
- Added buck2 examples to demonstrate 3 APIs to do selective build on any executorch runtime build
- Run [test_selective_build.sh](https://github.com/pytorch/executorch/blob/main/examples/selective_build/test_selective_build.sh)
-

stable-2023-08-15
- New models in example folder:
- Torchvision ViT. Run the example from `executorch` dir:
- `python3 -m examples.export.export_example --model_name="vit"`
- `buck2 run //examples/executor_runner:executor_runner -- --model_path vit.pte`
- Quantization workflow example added and validated to work with MV2:
- `python3 -m examples.quantization.example --model_name mv2`
- CMake build:
- [executor_runner](https://github.com/pytorch/executorch/blob/main/examples/executor_runner/executor_runner.cpp) can be built via cmake. See [cmake_build_system.md](https://github.com/pytorch/executorch/blob/main/docs/website/docs/tutorials/cmake_build_system.md).
- Custom ops:
- Add examples to register custom ops into EXIR and Executorch runtime.
- Note: buck2 in [test_custom_ops.sh](https://github.com/pytorch/executorch/blob/main/examples/custom_ops/test_custom_ops.sh) should point to installed buck2 if it is not accessible in system’s PATH

stable-2023-08-01
Initial release to early users.

Releases

Has known vulnerabilities