Highlights
- Memory use improvements:
- Gradient checkpointing for training with `mx.checkpoint`
- Better graph execution order
- Buffer donation
Core
- Gradient checkpointing with `mx.checkpoint`
- CPU only QR factorization `mx.linalg.qr`
- Release Python GIL during `mx.eval`
- Depth-based graph execution order
- Lazy loading arrays from files
- Buffer donation for reduced memory use
- `mx.diag`, `mx.diagonal`
- Breaking: `array.shape` is a Python tuple
- GPU support for `int64` and `uint64` reductions
- vmap over reductions and arg reduction:
- `sum`, `prod`, `max`, `min`, `all`, `any`
- `argmax`, `argmin`
NN
- Softshrink activation
Bugfixes
- Comparisons with `inf` work, and fix `mx.isinf`
- Bug fix with RoPE cache
- Handle empty Matmul on the CPU
- Negative shape checking for `mx.full`
- Correctly propagate `NaN` in some binary ops
- `mx.logaddexp`, `mx.maximum`, `mx.minimum`
- Fix > 4D non-contiguous binary ops
- Fix `mx.log1p` with `inf` input
- Fix SGD to apply weight decay even with 0 momentum