**Status:** July 2022
The 4.0.0 release introduces some major performance enhancements, and a number of larger changes to the heuristics used in the codec to find a more effective cost:quality trade off. The core compressor library is between 1.2x (4x4 blocks) and 1.6x (6x6 blocks) faster than the previous 3.7 release. The core decompressor library is 1.25x faster than the 3.7 release.
* **General:**
* **Change:** The `-array` option for specifying the number of image planes for ASTC 3D volumetric block compression been renamed to `-zdim`.
* **Change:** The build root package directory is now `bin` instead of `astcenc`, allowing the CMake install step to write binaries into
`/usr/local/bin` if the user wishes to do so.
* **Feature:** A new `-ssw` option for specifying the shader sampling swizzle has been added as convenience alternative to the `-cw` option. This is needed to correct error weighting during compression if not all components are read in the shader. For example, to extract and compress two components from an RGBA input image, weighting the two components equally when sampling through `texture().ra` in the shader, use `-esw ggga -ssw ra`. In this example `-ssw ra` is equivalent to the alternative `-cw 1 0 0 1` encoding.
* **Feature:** The `-a` alpha weighting option has been re-enabled in the backend, and now again applies alpha scaling to the RGB error metrics when encoding. This is based on the maximum alpha in each block, not the individual texel alpha values used in the earlier implementation.
* **Feature:** The command line tool now has `-repeats <count>` for testing, which will iterate around compression and decompression `count` times. Reported performance metrics also now separate compression and decompression scores.
* **Feature:** The core codec is now warning clean up to `/W4` for both MSVC `cl.exe` and `clangcl.exe` compilers.
* **Feature:** The core codec now supports arm64 for both MSVC `cl.exe` and `clangcl.exe` compilers.
* **Feature:** `NO_INVARIANCE` builds will enable the `-ffp-contract=fast` option for all targets when using Clang or GCC. In addition AVX2 targets will also set the `-mfma` option. This reduces image quality by up to 0.2dB (normally much less), but improves performance by up to 5-20%.
* **Optimization:** Angular endpoint min/max weight selection is restricted to weight `QUANT_11` or lower. Higher quantization levels assume default 0-1 range, which is less accurate but much faster.
* **Optimization:** Maximum weight quantization for later trials is selected based on the weight quantization of the best encoding from the 1 plane 1 partition trial. This significantly reduces the search space for the later trials with more planes or partitions.
* **Optimization:** Small data tables now use in-register SIMD permutes rather than gathers (AVX2) or unrolled scalar lookups (SSE/NEON). This can be a significant optimization for paths that are load unit limited.
* **Optimization:** Decompressed image block writes in the decompressor now use a vectorized approach to writing each row of texels in the block, including to ability to exploit masked stores if the target supports them.
* **Optimization:** Weight scrambling has been moved into the physical layer; the rest of the codec now uses linear order weights.
* **Optimization:** Weight packing has been moved into the physical layer; the rest of the codec now uses unpacked weights in the 0-64 range.
* **Optimization:** Consistently vectorize the creation of unquantized weight grids when they are needed.
* **Optimization:** Remove redundant per-decimation mode copies of endpoint and weight structures, which were really read-only duplicates.
* **Optimization:** Early-out the same endpoint mode color calculation if it cannot be applied.
* **Optimization:** Numerous type size reductions applied to arrays to reduce both context working buffer size usage and stack usage.
Binary release sha256 checksums
7899f73ee2820b0e7d73b38fda6feb58948a30b511dd123ec4b7c89f3aae531a astcenc-4.0.0-linux-x64.zip
9697d0ad63910cd4f1fd8be51203e9c1873aa6d219ac70540ad0f37e19dd3e63 astcenc-4.0.0-macos-aarch64.zip
26866ad79d80a6153f1746516004e5a95df07c55c6866e44ed63bb7953c7532e astcenc-4.0.0-macos-x64.zip
b83a7d0c88d7e720828bf91655ad4acccee95b2d8086835ef26879b905777172 astcenc-4.0.0-windows-x64.zip