coremltools Changelog

8.1

Release Notes
* Python Support
* Compatible with Python 3.12.
* Added support for additional PyTorch operations
* `torch.clamp_max`, `torch.rand_like`, `torch.all`, `torch.linalg_inv`, `torch.nan_to_num`, `torch.cumprod`, `torch.searchsorted` ops are now supported.
* Increased conversion support coverage for models produced by `torch.export`
* Op translation support is at 68% parity with our mature `torch.jit.trace`converter.
* Support enumerated shape model.
* Support `ImageType` input.
* Added Python bindings for the following classes:
* [MLComputePlan](https://developer.apple.com/documentation/coreml/mlcomputeplan-85vdw)
* [MLComputeDevice](https://developer.apple.com/documentation/coreml/mlcomputedeviceprotocol?language=objc)
* [MLModelStructure](https://developer.apple.com/documentation/coreml/mlmodelstructure-c.class?language=objc)
* [MLModelAsset](https://developer.apple.com/documentation/coreml/mlmodelasset?language=objc)
* Various other bug fixes, enhancements, clean ups and optimizations.
* Favor bool mask in scaled dot product attention
* Fix quantization crash with bool mask
* Special thanks to our external contributors for this release: M-Quadra benjaminkech guru-desh

8.0

Release Notes

8.0b2

Release Notes
* Support for Latest Dependencies
* Compatible with the latest `protobuf` python package: Improves serialization latency.
* Compatible with `numpy 2.0`.
* Supports `scikit-learn 1.5`.
* New Core ML model utils
* `coremltools.models.utils.bisect_model` can break a large Core ML model into two smaller models with similar sizes.
* `coremltools.models.utils.materialize_dynamic_shape_mlmodel` can convert a flexible input shape model into a static input shape model.
* New compression features in `coremltools.optimize.coreml`
* Vector palettization: By setting `cluster_dim > 1` in `coremltools.optimize.coreml.OpPalettizerConfig`, you can do the vector palettization, where each entry in the lookup table is a vector of length `cluster_dim`.
* Palettization of per channel scale: By setting `enable_per_channel_scale=True` in `coremltools.optimize.coreml.OpPalettizerConfig`, weights are normalized along the output channel using per channel scales before being palettized.
* Joint compression: A new pattern is supported, where weights are first quantized to int8 and then palettized into n-bit look-up table with int8 entries.
* Support conversion of palettized model with 8bits LUT produced from `coremltools.optimize.torch`.
* New compression features / bug fixes in `coremltools.optimize.torch`
* Added conversion support for Torch models jointly compressed using the training time APIs in `coremltools.optimize.torch` .
* Added vector palettization support to `SKMPalettizer` .
* Fixed bug in construction of weight vectors along output channel for vector palettization with `PostTrainingPalettizer` and `DKMPalettizer` .
* Deprecated `cluter_dtype` option in favor of `lut_dtype` in `ModuleDKMPalettizerConfig` .
* Added support for quantizing `ConvTranspose` modules with `PostTrainingQuantizer` and `LinearQuantizer` .
* Added static grouping for activation heuristic in `GPTQ`.
* Fixed bug in how quantization scales are computed for `Conv2D` layer with per-block quantization in `GPTQ` .
* Can now perform activation only quantization with `QAT` APIs.
* Experimental `torch.export` conversion support
* Support conversion of stateful models with mutable buffer.
* Support conversion of dynamic inputs shape models.
* Support conversion of 4-bit weight compression models.
* Support new torch ops: `clip` .
* Various other bug fixes, enhancements, clean ups and optimizations.
* Special thanks to our external contributors for this release: dpanshu , timsneath , kasper0406 , lamtrinhdev , valfrom

Appendix
* Example code of converting stateful `torch.export` model

import torch
import coremltools as ct

class Model(torch.nn.Module):
def __init__(self):
super(Model, self).__init__()
self.register_buffer("state_1", torch.tensor([0.0, 0.0, 0.0]))

def forward(self, x):
In place update of the model state
self.state_1.mul_(x)
return self.state_1 + 1.0

source_model = Model()
source_model.eval()

8.0b1

For all the new features, find the updated documentation in the [docs-guides](https://apple.github.io/coremltools/docs-guides/source/new-features.html#new-in-core-ml-tools-8)

* New utilities `coremltools.utils.MultiFunctionDescriptor()` and `coremltools.utils.save_multifunction` , for creating an `mlprogram` with multiple functions in it, that can share weights. Updated the model loading API to load specific functions for prediction.
* Stateful Core ML models: updates to the converter to produce Core ML models with the State Type (new type introduced in iOS18/macOS15).
* `coremltools.optimize`
* Updates to model representation (`mlprogram`) pertaining to compression:
* Support compression with more granularities: blockwise quantization, grouped channel wise palettization
* 4 bit weight quantization (in addition to 8 bit quantization that was already supported)
* 3 bit palettization (in addition to 1,2,4,6,8 bit palettization that was already supported)
* Support joint compression modes:
* 8 bit Look-up-tables for palettization
* ability to combine weight pruning and palettization
* ability to combine weight pruning and quantization
* API updates:
* `coremltools.optimize.coreml`
* Updated existing APIs to account for features mentioned above
* Support joint compression by applying compression techniques on an already compressed model
* A new API to support activation quantization using calibration data, which can be used to take a W16A16 Core ML model and produce a W8A8 model: `ct.optimize.coreml.experimental.linear_quantize_activations`
* (to be upgraded from experimental to the official name space in a future release)
* `coremltools.optimize.torch`
* Updated existing APIs to account for features mentioned above
* Added new APIs for data free compression (`PostTrainingPalettizer` , `PostTrainingQuantizer`
* Added new APIs for calibration data based compression (`SKMPalettizer` for sensitive k-means palettization algorithm, `layerwise_compression` for GPTQ/sparseGPT quantization/pruning algorithm)
* Updated the APIs + the `coremltools.convert` implementation, so that for converting torch models compressed with `ct.optimize.torch` , there is no longer a need to provide additional pass pipeline arguments.
* iOS18 / macOS15 ops
* compression related ops: `constexpr_blockwise_shift_scale`, `constexpr_lut_to_dense`, `constexpr_sparse_to_dense`, etc
* updates to the GRU op
* PyTorch op `scaled_dot_product_attention`
* Experimental `torch.export` conversion support

import torch
import torchvision

import coremltools as ct

torch_model = torchvision.models.vit_b_16(weights="IMAGENET1K_V1")

x = torch.rand((1, 3, 224, 224))
example_inputs = (x,)
exported_program = torch.export.export(torch_model, example_inputs)

coreml_model = ct.convert(exported_program)

* Various other bug fixes, enhancements, clean ups and optimizations

**Known Issues**

* Conversion will fail when using certain palettization modes (e.g. int8 LUT, vector palettization) with torch models using `ct.optimize.torch`
* Some of the joint compression modes when used with the training time APIs in `ct.optimize.torch` will result in a torch model that is not correctly converted
* The post-training palettization config for mlpackage models (`ct.optimize.coreml.``OpPalettizerConfig`) does not yet have all the arguments that are supported in the `cto.torch.palettization` APIs (e.g. `lut_dtype` (to get int8 dtyped LUT), `cluster_dim` (to do vector palettization), `enable_per_channel_scale` (to apply per-channel-scale) etc).
* Applying symmetric quantization using GPTQ algorithm with `ct.optimize.torch.layerwise_compression.LayerwiseCompressor` will not produce the correct quantization scales, due to a [known bug](https://github.com/apple/coremltools/pull/2242). This may lead to poor accuracy for the quantized model

Special thanks to our external contributors for this release: teelrabbit igeni Cyanosite

7.2

* New Features
* Supports ExecuTorch 0.2 (see [ExecuTorch doc](https://pytorch.org/executorch/stable/build-run-coreml.html) for examples)
* Core ML Partitioner: If a PyTorch model is partially supported with Core ML, then Core ML partitioner can determine the supported part and have ExecuTorch delegate to Core ML.
* Core ML Quantizer: Quantize PyTorch models in Core ML favored scheme
* Enhancements
* Improved Model Conversion Speed
* Expanded Operation Translation Coverage
* add `torch.narrow`
* add `torch.adaptive_avg_pool1d` and `torch.adaptive_max_pool1d`
* add `torch.numpy_t` (i.e. the numpy-style transpose operator `.T`)
* enhance `torch.clamp_min` for integer data type
* enhance `torch.add` for complex data type
* enhance `tf.math.top_k` when `k` is variable

Thanks to our ExecuTorch partners and our open-source community: KrassCodes M-Quadra teelrabbit minimalic alealv ChinChangYang pcuenca

7.1

* **New Features**:
* Supports Torch 2.1
* Includes experimental support for `torch.export` API but limited to EDGE dialect.
* Example usage:

*
import torch
from torch.export import export
from executorch.exir import to_edge

import coremltools as ct

example_args = (torch.randn(*size), )
aten_dialect = export(AnyNNModule(), example_args)
edge_dialect = to_edge(aten_dialect).exported_program()
edge_dialect._dialect = "EDGE"

mlmodel = ct.convert(edge_dialect)

* **Enhancements**:
* API - `ct.utils.make_pipeline` - now allows specifying compute_units
* New optimization passes:
* Folds selective data movement ops like reshape, transpose into adjacent constant compressed weights
* Casts int32 → int16 dtype for all intermediate tensors when compute precision is set to fp16
* PyTorch op - multinomial - Adds lowering for it to CoreML
* Type related refinements on Pad and Gather/Gather-like ops
* **Bug Fixes**:
* Fixes coremltools build issue related to kmeans1d package
* Minor fixes in lowering of PyTorch ops: masked_fill & randint
* Various other bug fixes, enhancements, clean ups and optimizations.

Coremltools

Page 1 of 7

8.1

8.0

8.0b2

8.0b1

7.2

7.1

Page 1 of 7

Links

Releases