* Optimize (with AVX) the processing of contiguous int16 arrays. ~2.3x speedup compared to 0.3.0
0.3.0
Added
* Distribute source
Changes
* Add support for ARM (without NEON optimizations for now) on Linux and macOS * Update supported numpy version range to >=1.21,<2
0.2.1
Changes
* Add support for AVX512. It will only be used if the CPU reports that it supports it. * Compile builds for linux with clang instead of gcc, as this seems to yield tiny performance improvements
0.2.0
Changes
* Add support for Python 3.12 * Significantly speed up the processing of 1-dimensional strided arrays * Slightly speed up the processing of ndarrays with at least 16 items
0.1.1
Changes
* Slightly speed up the processing of 2D arrays * Speed up the processing of arrays with ndim > 2 * Speed up the processing of F-contiguous ndarrays