Simsimd

Latest version: v6.2.1

Safety actively analyzes 723177 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 21

5.9.0

SimSIMD is expanding and becoming __closer to a fully-fledged BLAS library. BLAS level 1 for now, but it's a start!__ SimSIMD will prioritize mixed and low-precision vector math, favoring modern AI workloads. For image & media processing workloads, the new `fma` and `wsum` kernels approach 65 GB/s per core on Intel Sapphire Rapids. That's __100x faster__ than the serial code for `u8` inputs with `f32` scaling and accumulation.

Contains the following element-wise operations:

math
\text{FMA}_i(A, B, C, \alpha, \beta) = \alpha \cdot A_i \cdot B_i + \beta \cdot C_i


math
\text{WSum}_i(A, B, \alpha, \beta) = \alpha \cdot A_i + \beta \cdot B_i


In NumPy terms:

py
import numpy as np
def wsum(A: np.ndarray, B: np.ndarray, Alpha: float, Beta: float) -> np.ndarray:
assert A.dtype == B.dtype, "Input types must match and affect the output style"
return (Alpha * A + Beta * B).astype(A.dtype)
def fma(A: np.ndarray, B: np.ndarray, C: np.ndarray, Alpha: float, Beta: float) -> np.ndarray:
assert A.dtype == B.dtype and A.dtype == C.dtype, "Input types must match and affect the output style"
return (Alpha * A * B + Beta * C).astype(A.dtype)


This tiny set of operations is enough to implement a wide range of algorithms:

- To scale a vector by a scalar, just call **WSum** with $\beta = 0$.
- To sum two vectors, just call **WSum** with $\alpha = \beta = 1$.
- To average two vectors, just call **WSum** with $\alpha = \beta = 0.5$.
- To multiply vectors element-wise, just call **FMA** with $\beta = 0$.

Benchmarks

On Intel Sapphire Rapids:

sh
Run on (16 X 3900 MHz CPU s)
CPU Caches:
L1 Data 48 KiB (x8)
L1 Instruction 32 KiB (x8)
L2 Unified 2048 KiB (x8)
L3 Unified 61440 KiB (x1)

5.8.0

5.7.3

5.7.2

5.7.1

5.7.0

There are several ongoing efforts to extend the functionality of SimSIMD and this PR prepares some of the groundwork for:

- 🆕 AMD Turin capability level
- 🆕 Intel Sierra Forest capability level

Those are some amazing CPUs, featuring up to 244 cores per socket, with reduced latencies for some very powerful instructions. Moreover, SimSIMD now provides:

- 🆕 Spatial kernels for sub-byte `i4` vectors
- 🆕 Sparse Dot Products

This PR also:

- [x] Fixes `cdist` for complex inputs
- [x] Enables dynamic dispatch in Swift
- [x] Enables dynamic dispatch in JavaScript
- [x] Ships a new benchmarking suite

Page 5 of 21

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.