Many data scientists embark on their journey by implementing K-Means clustering, much like app developers starting with a calculator. But despite K-Means’ popularity, most implementations overlook the power of SIMD on modern CPUs. Efficient vector math, especially with single- and double-precision floating-point vectors, is challenging due to the computational cost of accuracy. Meanwhile, `float16`, `bfloat16`, and smaller types can fail under uneven distributions or when computing centroids for large clusters. So, what’s Unum’s solution? **Mixed precision!**
Thanks to strong community support and sponsorship from sutoiku ([LinkedIn](https://www.linkedin.com/company/sutoiku), [Website](https://stoic.com/)), we're introducing a high-performance K-Means implementation! It utilizes any numeric type for distance calculations, switching to `float64` for centroid updates, a technique that boosts performance and enables billion-scale clustering on a single machine.