As was discussed in the [SciPy integration thread](https://github.com/scipy/scipy/issues/19454), Python libraries use double-precision floating-point numbers by default. So in this release I've extended the spatial distance functions - `cosine`, `sqeuclidean`, `inner` with support for `double` arguments with specialized implementations on AVX-512-capable x86 CPUs and SVE-capable Arm CPUs.
Benchmarking SimSIMD vs. SciPy on Intel Sapphire Rapids CPU
- Vector dimensions: 1536
- Vectors count: 1000
- Hardware capabilities: `serial`, `x86_avx2`, `x86_avx512`, `x86_avx2fp16`, `x86_avx512fp16`, `x86_avx512vpopcntdq`, `x86_avx512vnni`
- NumPy BLAS dependency: `openblas64`
- NumPy LAPACK dependency: `dep140640983012528`
Between 2 Vectors, Batch Size: 1
| Datatype | Method | Ops/s | SimSIMD Ops/s | SimSIMD Improvement |
| :------- | :-------------------- | -------------------: | -------------------: | ------------------: |
| `f64` | `scipy.cosine` | 63,612 | 572,605 | 9.00 x |
| `f64` | `scipy.sqeuclidean` | 238,547 | 915,596 | 3.84 x |
| `f64` | `numpy.inner` | 449,499 | 986,522 | 2.19 x |
Between 2 Vectors, Batch Size: 1,000
| Datatype | Method | Ops/s | SimSIMD Ops/s | SimSIMD Improvement |
| :------- | :-------------------- | -------------------: | -------------------: | ------------------: |
| `f64` | `scipy.cosine` | 68,962 | 1,457,172 | 21.13 x |
| `f64` | `scipy.sqeuclidean` | 247,727 | 1,535,547 | 6.20 x |
| `f64` | `numpy.inner` | 463,509 | 1,512,004 | 3.26 x |
Benchmarking SimSIMD vs. SciPy on AWS Graviton 3
- Vector dimensions: 1536
- Vectors count: 1000
- Hardware capabilities: `serial`, `arm_neon`, `arm_sve`
- NumPy BLAS dependency: `openblas64`
- NumPy LAPACK dependency: `openblas64`
Between 2 Vectors, Batch Size: 1
| Datatype | Method | Ops/s | SimSIMD Ops/s | SimSIMD Improvement |
| :------- | :-------------------- | -------------------: | -------------------: | ------------------: |
| `f64` | `scipy.cosine` | 40,729 | 725,382 | 17.81 x |
| `f64` | `scipy.sqeuclidean` | 160,812 | 728,114 | 4.53 x |
| `f64` | `numpy.inner` | 473,443 | 767,374 | 1.62 x |
| `f64` | `scipy.jensenshannon` | 15,684 | 38,528 | 2.46 x |
| `f64` | `scipy.kl_div` | 49,983 | 61,811 | 1.24 x |
Between 2 Vectors, Batch Size: 1,000
| Datatype | Method | Ops/s | SimSIMD Ops/s | SimSIMD Improvement |
| :------- | :-------------------- | -------------------: | -------------------: | ------------------: |
| `f64` | `scipy.cosine` | 41,130 | 1,460,850 | 35.52 x |
| `f64` | `scipy.sqeuclidean` | 162,147 | 1,486,255 | 9.17 x |
| `f64` | `numpy.inner` | 473,856 | 1,580,136 | 3.33 x |