Some performance improvements, Numpy 2 compat, and initial support for [Cubed!](https://cubed-dev.github.io/cubed/)
What's Changed
* Initial minimal working Cubed example for "map-reduce" by tomwhite in https://github.com/xarray-contrib/flox/pull/352
* Fix benchmarks by dcherian in https://github.com/xarray-contrib/flox/pull/358
* Optimize bitmask finding for chunk size 1 and single chunk cases by dcherian in https://github.com/xarray-contrib/flox/pull/360
* Add cubed notebook for hourly climatology example using "map-reduce" method by tomwhite in https://github.com/xarray-contrib/flox/pull/356
* Manually fuse reindexing intermediates with blockwise reduction for cohorts. by dcherian in https://github.com/xarray-contrib/flox/pull/300
* Use threadpool for finding labels in chunk by dcherian in https://github.com/xarray-contrib/flox/pull/327
* Optimize `min_count` when `expected_groups` is not provided. by dcherian in https://github.com/xarray-contrib/flox/pull/236
* import `normalize_axis_index` from `numpy.lib` on `numpy>=2` by keewis in https://github.com/xarray-contrib/flox/pull/364
New Contributors
* tomwhite made their first contribution in https://github.com/xarray-contrib/flox/pull/352
**Full Changelog**: https://github.com/xarray-contrib/flox/compare/v0.9.6...v0.9.7