* Improved decompression performance ~70% on aarch64, ~10% on x86_64.
* Supported consuming any `BetterBufRead` implementation during decompression, rather than only `&[u8]`
* Changed the API for `wrapped::PageDecompressor` and `standalone::ChunkDecompressor` to own `src`, since these parts of the file need to be read in order and contiguously.
* Updated docs, including real-world benchmarks on air quality, taxi, and r/place datasets.