Tract

Latest version: v0.21.11

Safety actively analyzes 723217 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 14

0.21.11

* [cli] augment audit capabilities for mm implementation choices
* revisit matmul kernel selection
* improve gather with compressed inputs
* revisit slice bubbling up to unlock optimisations
* fix a bug around flipping substractions
* support for left q40 input in arm64 f32 accumulating kernels (unlocks q40f32 compression on arm64)

0.21.10

* WIP llm testability (--approx-custom)
* [metal] ggml-ported kernels
* WIP einsum-to-matmul testability
* optimisation around reduce<sum> impacting some modern/exotic normalisation layers
* WIP towards better handling of shared weights (e.g. embeddings duplication)

0.21.9

* [metal] experimental profile
* [cpu] new versatile (mmm/mmmv) kernels combinations for various architectures
* [metal] scaled-masked-softmax detector and impl

0.21.8

* [linalg, compression] introduce mmm kits
* [linalg] (wip) rework f16 on non-f16 machines
* [linalg] element-wise binary operators optimisation
* [core, compression] gather with compressed weights
* [metal] new kernels
* [metal] new memory management
* [nnef] opt-in deterministic tar encoding

0.21.7

* [metal] (experimental) introduce partial support for Apple Metal
* [core] Potential internal API breaking changes (operator names, comparison ops refactored)
* [data] (experimental) Smarter TDim simplification, handling of Min and Max. TDim assertions for simplifications.
* [data] (experimental) WIP around multiple scenarios (modes) for LLM inference
* Extra examples
* [linalg] (experimental) kernels targetting LLM block-quantized tasks (inc. intel 32x1 q40f32)

0.21.6

* [data] Rework tdim and symbols, introduce inequalities assertions, min and max operators
* [data] Generalize Blob usage in Tensor
* [linalg] Rework reduce implementation, introduce more generic binary ops support (wip)
* [linalg] Introduce multithreaded matrix multiplication runner
* [linalg] Introduce Q4_0 block quantization for weights (wip)
* [linalg] Introduce AMX f16 kernels, Neon Q40F16 kernel (experimental)
* [linalg] wasm f32 4x4 kernel
* [core] Introduce Opaque and OpaqueFact to escape Tensor and TValue formalism
* [core] generalize/improve float precision translator, with translation filter
* [core] Introduce garbage collecting in patch application, new compact algo, and rework constant propagation to spare memory
* [core] Rework packed format and packing metadata
* [linalg/core] Introduce multiple packing format for matmul kernels
* [core] Work In Progress refactoring binary, towards more optimized execution strategies
* [nnef] inequalities assertions extension, q4_0 extension
* [tflite] plug in tanh and sigmoid

Page 1 of 14

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.