Highlights
[BETA] Transforms and augmentations
![sphx_glr_plot_transforms_getting_started_004](https://github.com/pytorch/vision/assets/1190450/fc42eabe-d3fe-40c1-8365-2177e389521b)
Major speedups
The new transforms in `torchvision.transforms.v2` support image classification, segmentation, detection, and video tasks. They are now [10%-40% faster](https://github.com/pytorch/vision/issues/7497#issuecomment-1557478635) than before! This is mostly achieved thanks to 2X-4X improvements made to `v2.Resize()`, which now supports native `uint8` tensors for Bilinear and Bicubic mode. Output results are also now closer to PIL's! Check out our [performance recommendations](https://pytorch.org/vision/stable/transforms.html#performance-considerations) to learn more.
Additionally, `torchvision` now ships with `libjpeg-turbo` instead of `libjpeg`, which should significantly speed-up the jpeg decoding utilities ([`read_image`](https://pytorch.org/vision/stable/generated/torchvision.io.read_image.html#torchvision.io.read_image), [`decode_jpeg`](https://pytorch.org/vision/stable/generated/torchvision.io.read_image.html#torchvision.io.decode_jpeg)), and avoid compatibility issues with PIL.
CutMix and MixUp
Long-awaited support for the `CutMix` and `MixUp` augmentations is now here! Check [our tutorial](https://pytorch.org/vision/stable/auto_examples/transforms/plot_cutmix_mixup.html#sphx-glr-auto-examples-transforms-plot-cutmix-mixup-py) to learn how to use them.
Towards stable V2 transforms
In the [previous release 0.15](https://github.com/pytorch/vision/releases/tag/v0.15.1) we BETA-released a new set of transforms in `torchvision.transforms.v2` with native support for tasks like segmentation, detection, or videos. We have now stabilized the design decisions of these transforms and made further improvements in terms of speedups, usability, new transforms support, etc.
We're keeping the `torchvision.transforms.v2` and `torchvision.tv_tensors` namespaces as BETA until 0.17 out of precaution, but we do not expect disruptive API changes in the future.
Whether you’re new to Torchvision transforms, or you’re already experienced with them, we encourage you to start with [Getting started with transforms v2](https://pytorch.org/vision/stable/auto_examples/transforms/plot_transforms_getting_started.html#sphx-glr-auto-examples-transforms-plot-transforms-getting-started-py) in order to learn more about what can be done with the new v2 transforms.
Browse our [main docs](https://pytorch.org/vision/stable/transforms.html#) for general information and performance tips. The available transforms and functionals are listed in the [API reference](https://pytorch.org/vision/stable/transforms.html#v2-api-ref). Additional information and tutorials can also be found in our [example gallery](https://pytorch.org/vision/stable/auto_examples/index.html#gallery), e.g. [Transforms v2: End-to-end object detection/segmentation example](https://pytorch.org/vision/stable/auto_examples/transforms/plot_transforms_e2e.html#sphx-glr-auto-examples-transforms-plot-transforms-e2e-py) or [How to write your own v2 transforms](https://pytorch.org/vision/stable/auto_examples/transforms/plot_custom_transforms.html#sphx-glr-auto-examples-transforms-plot-custom-transforms-py).
[BETA] MPS support
The `nms` and roi-align kernels (`roi_align`, `roi_pool`, `ps_roi_align`, `ps_roi_pool`) now support MPS. Thanks to [Li-Huai (Allan) Lin](https://github.com/qqaatw) for this contribution!
---------
Detailed Changes
Deprecations / Breaking changes
All changes below happened in the `transforms.v2` and `datapoints` namespaces, which were BETA and protected with a warning. **We do not expect other disruptive changes to these APIs moving forward!**
[transforms.v2] `to_grayscale()` is not deprecated anymore (7707)
[transforms.v2] Renaming: `torchvision.datapoints.Datapoint` -> `torchvision.tv_tensors.TVTensor` (7904, 7894)
[transforms.v2] Renaming: `BoundingBox` -> `BoundingBoxes` (7778)
[transforms.v2] Renaming: `BoundingBoxes.spatial_size` -> `BoundingBoxes.canvas_size` (7734)
[transforms.v2] All public method on `TVTensor` classes (previously: `Datapoint` classes) were removed
[transforms.v2] `transforms.v2.utils` is now private. (7863)
[transforms.v2] Remove `wrap_like` class method and add `tv_tensors.wrap()` function (7832)
New Features
[transforms.v2] Add support for `MixUp` and `CutMix` (7731, 7784)
[transforms.v2] Add `PermuteChannels` transform (7624)
[transforms.v2] Add `ToPureTensor` transform (7823)
[ops] Add MPS kernels for `nms` and `roi` ops (7643)
Improvements
[io] Added support for CMYK images in `decode_jpeg` (7741)
[io] Package torchvision with `libjpeg-turbo` instead of `libjpeg` (7672, 7840)
[models] Downloaded weights are now sha256-validated (7219)
[transforms.v2] Massive `Resize` speed-up by adding native `uint8` support for bilinear and bicubic modes (7557, 7668)
[transforms.v2] Enforce pickleability for v2 transforms and wrapped datasets (7860)
[transforms.v2] Allow catch-all "others" key in `fill` dicts. (7779)
[transforms.v2] Allow passthrough for `Resize` (7521)
[transforms.v2] Add `scale` option to `ToDtype`. Remove `ConvertDtype`. (7759, 7862)
[transforms.v2] Improve UX for `Compose` (7758)
[transforms.v2] Allow users to choose whether to return `TVTensor` subclasses or pure `Tensor` (7825)
[transforms.v2] Remove import-time warning for v2 namespaces (7853, 7897)
[transforms.v2] Speedup `hsv2rgb` (7754)
[models] Add `filter` parameters to `list_models()` (7718)
[models] Assert `RAFT` input resolution is 128 x 128 or higher (7339)
[ops] Replaced `gpuAtomicAdd` by `fastAtomicAdd` (7596)
[utils] Add GPU support for `draw_segmentation_masks` (7684)
[ops] Add deterministic, pure-Python `roi_align` implementation (7587)
[tv_tensors] Make `TVTensors` deepcopyable (7701)
[datasets] Only return small set of targets by default from dataset wrapper (7488)
[references] Added support for v2 transforms and `tensors` / `tv_tensors` backends (7732, 7511, 7869, 7665, 7629, 7743, 7724, 7742)
[doc] A lot of documentation improvements (7503, 7843, 7845, 7836, 7830, 7826, 7484, 7795, 7480, 7772, 7847, 7695, 7655, 7906, 7889, 7883, 7881, 7867, 7755, 7870, 7849, 7854, 7858, 7621, 7857, 7864, 7487, 7859, 7877, 7536, 7886, 7679, 7793, 7514, 7789, 7688, 7576, 7600, 7580, 7567, 7459, 7516, 7851, 7730, 7565, 7777)
Bug Fixes
[datasets] Fix `split=None` in `MovingMNIST` (7449)
[io] Fix heap buffer overflow in `decode_png` (7691)
[io] Fix blurry screen in video decoder (7552)
[models] Fix weight download URLs for some models (7898)
[models] Fix `ShuffleNet` ONNX export (7686)
[models] Fix detection models with pytorch 2.0 (7592, 7448)
[ops] Fix segfault in `DeformConv2d` when `mask` is None (7632)
[transforms.v2] Stricter `SanitizeBoundingBoxes` `labels_getter` heuristic (7880)
[transforms.v2] Make sure `RandomPhotometricDistort` transforms all images the same (7442)
[transforms.v2] Fix `v2.Lambda`’s transformed types (7566)
[transforms.v2] Don't call `round()` on float images for `Resize` (7669)
[transforms.v2] Let `SanitizeBoundingBoxes` preserve output type (7446)
[transforms.v2] Fixed int type support for sigma in `GaussianBlur` (7887)
[transforms.v2] Fixed issue with jitted `AutoAugment` transforms (7839)
[transforms] Fix `Resize` pass-through logic (7519)
[utils] Fix color in `draw_segmentation_masks` (7520)
Others
[tests] Various test improvements / fixes (7693, 7816, 7477, 7783, 7716, 7355, 7879, 7874, 7882, 7447, 7856, 7892, 7902, 7884, 7562, 7713, 7708, 7712, 7703, 7641, 7855, 7842, 7717, 7905, 7553, 7678, 7908, 7812, 7646, 7841, 7768, 7828, 7820, 7550, 7546, 7833, 7583, 7810, 7625, 7651)
[CI] Various CI improvements (7485, 7417, 7526, 7834, 7622, 7611, 7872, 7628, 7499, 7616, 7475, 7639, 7498, 7467, 7466, 7441, 7524, 7648, 7640, 7551, 7479, 7634, 7645, 7578, 7572, 7571, 7591, 7470, 7574, 7569, 7435, 7635, 7590, 7589, 7582, 7656, 7900, 7815, 7555, 7694, 7558, 7533, 7547, 7505, 7502, 7540, 7573)
[Code Quality] Various code quality improvements (7559, 7673, 7677, 7771, 7770, 7710, 7709, 7687, 7454, 7464, 7527, 7462, 7662, 7593, 7797, 7805, 7786, 7831, 7829, 7846, 7806, 7814, 7606, 7613, 7608, 7597, 7792, 7781, 7685, 7702, 7500, 7804, 7747, 7835, 7726, 7796)
Contributors
We're grateful for our community, which helps us improve torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release:
Adam J. Stewart, Aditya Oke , Andrey Talman, Camilo De La Torre, Christoph Reich, Danylo Baibak, David Chiu, David Garcia, Dennis M. Pöpperl, Dhuige, Duc Mguyen, Edward Z. Yang, Eric Sauser , Fansure Grin, Huy Do, Illia Vysochyn, Johannes, Kai Wana, Kobrin Eli, kurtamohler, Li-Huai (Allan) Lin, Liron Ilouz, Masahiro Hiramori, Mateusz Guzek, Max Chuprov, Minh-Long Luu (刘明龙), Minliang Lin, mpearce25, Nicolas Granger, Nicolas Hug , Nikita Shulga, Omkar Salpekar, Paul Mulders, Philip Meier , ptrblck, puhuk, Radek Bartoň, Richard Barnes , Riza Velioglu, Sahil Goyal, Shu, Sim Sun, SvenDS9, Tommaso Bianconcini, Vadim Zubov, vfdev-5