- 🆕 `sz_checksum(char const *, size_t)` C 99 interface
- 🆕 `sz::str().checksum()` C++ 11 interface
- 🆕 `sz.checksum(str)` Python interface
Database and other Systems Engineers, you can now use StringZilla to dynamically dispatch different check-sum kernels for AVX2 capable Haswell+ CPUs, AVX-512BW capable Ice Lake+ CPUs, and Arm NEON CPUs on mobile. In AVX-512, masked loads are used extensively, resulting in a 10% improvement even on typical English words, averaging 5 bytes in length and __20x performance improvement compared to the serial code for longer strings__.
On the technical side, on x86, the kernels use the well-known `SAD(text, zeros)` idiom to accumulate absolute differences between individual bytes into 64-bit words. It also uses bidirectional traversal to saturate the core, capable of performing 2 loads per CPU cycle. Moreover, on large inputs, it switches to streaming loads, separately handling the head and the tail, similar to our `memcpy` alternative, also outperforming LibC on AVX-512-capable machines 😎
Minor
- Add: Checksums in Python (1b77de9)
- Add: Checksum tests (c2b997c)
- Add: Checksum kernels (a99337b)
Patch
- Docs: Simpler Python doc-strings (ad5fa2c)
- Fix: `sz_checksum` visibility (9bec0eb)
- Fix: Missing `_mm_cvtsi128_si64x` in Clang (c8c6c7c)