**USearch has now been used in two internet-scale companies/projects 🎉**
In one case, it's only 1 Billion vectors. The second is closer to 10 Billion per node. So it feels like a good time to enter the [Big ANN](https://big-ann-benchmarks.com) competition track 🐎 Still, we aim for an even larger **scale** and introduce a few improvements, such as the new `CompiledMetric` interface that allows one to mix C/C++, Python, and Assembly code for different hardware architectures in the same USearch configuration file 🤯 It should come in handy, given the rise in the number of exotic AI hardware vendors 😉
SIMD and JIT
The common idea across [Unum projects](https://github.com/unum-cloud) is to use SIMD for hardware acceleration when we hit the wall.
Writing SIMD can be challenging, but distributing it may be even more difficult. You are compiling for modern hardware and modern libraries, and all of a sudden, someone deploys it on an old machine...
> illegal instruction
A good balance can be achieved by compiling the core of USearch for higher compatibility and leaving plugs for hardware-specific optimizations. That is where Just-in-Time Compilation comes in. USearch allows combining the pre-compiled HNSW structure with JIT-ed distance metrics, which would tune the resulting assembly for the current target architecture. Such functionality was already supported with Numba JIT, but now you have more flexibility to choose different backends and define functions with different signatures:
- `MetricSignature.ArrayArray`, aka `float distance(scalar *, scalar *)`.
- `MetricSignature.ArrayArraySize`, aka `float distance(scalar *, scalar *, size_t)`.
- `MetricSignature.ArraySizeArraySize`, aka `float distance(scalar *, size_t, scalar *, size_t)`.
This enum is a core part of `CompiledMetric` and is passed down to our CPython binding.
py
class CompiledMetric(NamedTuple):
pointer: int
kind: MetricKind
signature: MetricSignature
Numba JIT
Just to remind you, a minimal Numba + USearch example would look like this:
py
from numba import cfunc, types, carray
ndim = 256
signature = types.float32(
types.CPointer(types.float32),
types.CPointer(types.float32))
cfunc(signature)
def inner_product(a, b):
a_array = carray(a, ndim)
b_array = carray(b, ndim)
c = 0.0
for i in range(ndim):
c += a_array[i] * b_array[i]
return 1 - c
index = Index(ndim=ndim, metric=CompiledMetric(
pointer=inner_product.address,
kind=MetricKind.IP,
signature=MetricSignature.ArrayArray,
))
No scary C++ or Assembly involved.
Cppyy
Similarly, you can use Cppyy with Cling to JIT-compile native C or C++ code and pass it to USearch, which may be a good idea if you want to explicitly request loop-unrolling or other low-level optimizations!
py
import cppyy
import cppyy.ll
cppyy.cppdef("""
float inner_product(float *a, float *b) {
float result = 0;
pragma unroll
for (size_t i = 0; i != ndim; ++i)
result += a[i] * b[i];
return 1 - result;
}
""".replace("ndim", str(ndim)))
function = cppyy.gbl.inner_product
index = Index(ndim=ndim, metric=CompiledMetric(
pointer=cppyy.ll.addressof(function),
kind=MetricKind.IP,
signature=MetricSignature.ArrayArraySize,
))
PeachPy
Instead of writing in C or C++, you can go one level down and directly assemble a kernel for x86. Below is an example of constructing the "Inner Product" distance for 8-dimensional f32 vectors for x86 using [PeachPy](https://github.com/Maratyszcza/PeachPy).
py
from peachpy import (
Argument,
ptr,
float_,
const_float_,
)
from peachpy.x86_64 import (
abi,
Function,
uarch,
isa,
GeneralPurposeRegister64,
LOAD,
YMMRegister,
VSUBPS,
VADDPS,
VHADDPS,
VMOVUPS,
VFMADD231PS,
VPERM2F128,
VXORPS,
RETURN,
)
a = Argument(ptr(const_float_), name="a")
b = Argument(ptr(const_float_), name="b")
with Function(
"inner_product", (a, b), float_, target=uarch.default + isa.avx2
) as asm_function:
Request two 64-bit general-purpose registers for addresses
reg_a, reg_b = GeneralPurposeRegister64(), GeneralPurposeRegister64()
LOAD.ARGUMENT(reg_a, a)
LOAD.ARGUMENT(reg_b, b)
Load the vectors
ymm_a = YMMRegister()
ymm_b = YMMRegister()
VMOVUPS(ymm_a, [reg_a])
VMOVUPS(ymm_b, [reg_b])
Prepare the accumulator
ymm_c = YMMRegister()
ymm_one = YMMRegister()
VXORPS(ymm_c, ymm_c, ymm_c)
VXORPS(ymm_one, ymm_one, ymm_one)
Accumulate A and B products into C
VFMADD231PS(ymm_c, ymm_a, ymm_b)
Reduce the contents of a YMM register
ymm_c_permuted = YMMRegister()
VPERM2F128(ymm_c_permuted, ymm_c, ymm_c, 1)
VADDPS(ymm_c, ymm_c, ymm_c_permuted)
VHADDPS(ymm_c, ymm_c, ymm_c)
VHADDPS(ymm_c, ymm_c, ymm_c)
Negate the values from "similarity" to "distance"
VSUBPS(ymm_c, ymm_one, ymm_c)
A common convention is to return floats in XMM registers
RETURN(ymm_c.as_xmm)
python_function = asm_function.finalize(abi.detect()).encode().load()
metric = CompiledMetric(
pointer=python_function.loader.code_address,
kind=MetricKind.IP,
signature=MetricSignature.ArrayArray,
)
index = Index(ndim=ndim, metric=metric)
SimSIMD
Last but not the list, for the most common metrics and vector spaces, I am going to provide pre-compiled optimized CPython Capsules as part of SimSIMD. Let us know which embeddings you use the most - OpenAI, BERT, ViT, or maybe [UForm](https://github.com/unum-cloud/uform)?
Release Notes: [0.19.0](https://github.com/unum-cloud/usearch/compare/v0.18.8...v0.19.0) (2023-06-24)
Add
* NDCG evaluation ([25cb6af](https://github.com/unum-cloud/usearch/commit/25cb6aff8f905d03b361d9e54736d8c24cc14a31))
* PeachPy example ([1bf0bbb](https://github.com/unum-cloud/usearch/commit/1bf0bbb8d796c3e9d0a9b8f12cd70e3645a2cb7d))
* Restore index from disk with only a path ([b7aec4c](https://github.com/unum-cloud/usearch/commit/b7aec4c9c2f8f22d7b6422019e327f2c9b029f99))
* Windows file-mapping API ([794f173](https://github.com/unum-cloud/usearch/commit/794f17378cfeed8b8888c9ad8a7dc07d27b3f620))
Fix
* Re-indexing vectors on restart ([529c137](https://github.com/unum-cloud/usearch/commit/529c137bcf7ffa2b49dcf20ce4f95adb219277ff))
Improve
* Preserve `metric_kind_t` in snapshots ([5ec0b65](https://github.com/unum-cloud/usearch/commit/5ec0b65357d4efd96c4c1d076f13b31e03bdb9e1))
* Support Cppyy JIT compilation ([80f99cd](https://github.com/unum-cloud/usearch/commit/80f99cd13a7e5dcc3066531990d97b320746694a))
* Support signatures for JIT-ed metrics ([c89e145](https://github.com/unum-cloud/usearch/commit/c89e145bfd06c85d14c324b1a1ce1ba64e9b46ca))
Hashes
* Source code (zip): `19c2e9f41134c17ac5db20567b268ee1fce1b4d5dbb42a8b6e5a252ac9e242de`
* Source code (tar.gz): `34e5bd00639504d752f0ab5eb96e62bc34ab18537e6e31823f3e487a4d8139a3`