For all the new features, find the updated documentation in the [docs-guides](https://apple.github.io/coremltools/docs-guides/source/new-features.html#new-in-core-ml-tools-8)
* New utilities `coremltools.utils.MultiFunctionDescriptor()` and `coremltools.utils.save_multifunction` , for creating an `mlprogram` with multiple functions in it, that can share weights. Updated the model loading API to load specific functions for prediction.
* Stateful Core ML models: updates to the converter to produce Core ML models with the State Type (new type introduced in iOS18/macOS15).
* `coremltools.optimize`
* Updates to model representation (`mlprogram`) pertaining to compression:
* Support compression with more granularities: blockwise quantization, grouped channel wise palettization
* 4 bit weight quantization (in addition to 8 bit quantization that was already supported)
* 3 bit palettization (in addition to 1,2,4,6,8 bit palettization that was already supported)
* Support joint compression modes:
* 8 bit Look-up-tables for palettization
* ability to combine weight pruning and palettization
* ability to combine weight pruning and quantization
* API updates:
* `coremltools.optimize.coreml`
* Updated existing APIs to account for features mentioned above
* Support joint compression by applying compression techniques on an already compressed model
* A new API to support activation quantization using calibration data, which can be used to take a W16A16 Core ML model and produce a W8A8 model: `ct.optimize.coreml.experimental.linear_quantize_activations`
* (to be upgraded from experimental to the official name space in a future release)
* `coremltools.optimize.torch`
* Updated existing APIs to account for features mentioned above
* Added new APIs for data free compression (`PostTrainingPalettizer` , `PostTrainingQuantizer`
* Added new APIs for calibration data based compression (`SKMPalettizer` for sensitive k-means palettization algorithm, `layerwise_compression` for GPTQ/sparseGPT quantization/pruning algorithm)
* Updated the APIs + the `coremltools.convert` implementation, so that for converting torch models compressed with `ct.optimize.torch` , there is no longer a need to provide additional pass pipeline arguments.
* iOS18 / macOS15 ops
* compression related ops: `constexpr_blockwise_shift_scale`, `constexpr_lut_to_dense`, `constexpr_sparse_to_dense`, etc
* updates to the GRU op
* PyTorch op `scaled_dot_product_attention`
* Experimental `torch.export` conversion support
import torch
import torchvision
import coremltools as ct
torch_model = torchvision.models.vit_b_16(weights="IMAGENET1K_V1")
x = torch.rand((1, 3, 224, 224))
example_inputs = (x,)
exported_program = torch.export.export(torch_model, example_inputs)
coreml_model = ct.convert(exported_program)
* Various other bug fixes, enhancements, clean ups and optimizations
**Known Issues**
* Conversion will fail when using certain palettization modes (e.g. int8 LUT, vector palettization) with torch models using `ct.optimize.torch`
* Some of the joint compression modes when used with the training time APIs in `ct.optimize.torch` will result in a torch model that is not correctly converted
* The post-training palettization config for mlpackage models (`ct.optimize.coreml.``OpPalettizerConfig`) does not yet have all the arguments that are supported in the `cto.torch.palettization` APIs (e.g. `lut_dtype` (to get int8 dtyped LUT), `cluster_dim` (to do vector palettization), `enable_per_channel_scale` (to apply per-channel-scale) etc).
* Applying symmetric quantization using GPTQ algorithm with `ct.optimize.torch.layerwise_compression.LayerwiseCompressor` will not produce the correct quantization scales, due to a [known bug](https://github.com/apple/coremltools/pull/2242). This may lead to poor accuracy for the quantized model
Special thanks to our external contributors for this release: teelrabbit igeni Cyanosite