This release improves support for mobile, with new mobile-friendly detection models based on SSD and SSDlite, CPU kernels for quantized NMS and quantized RoIAlign, pre-compiled binaries for iOS available in cocoapods and an iOS demo app. It also improves image IO by providing JPEG decoding on the GPU, and many more.
Highlights
[BETA] New models for detection
[SSD](https://arxiv.org/abs/1512.02325) and [SSDlite](https://arxiv.org/abs/1801.04381) are two popular object detection architectures which are efficient in terms of speed and provide good results for low resolution pictures. In this release, we provide implementations for the original SSD model with VGG16 backbone and for its mobile-friendly variant SSDlite with MobileNetV3-Large backbone. The models were pre-trained on COCO train2017 and can be used as follows:
python
import torch
import torchvision
Original SSD variant
x = [torch.rand(3, 300, 300), torch.rand(3, 500, 400)]
m_detector = torchvision.models.detection.ssd300_vgg16(pretrained=True)
m_detector.eval()
predictions = m_detector(x)
Mobile-friendly SSDlite variant
x = [torch.rand(3, 320, 320), torch.rand(3, 500, 400)]
m_detector = torchvision.models.detection.ssdlite320_mobilenet_v3_large(pretrained=True)
m_detector.eval()
predictions = m_detector(x)
The following accuracies can be obtained on COCO val2017 (full results available in 3403 and 3757):
Model | mAP | mAP50 | mAP75
-- | -- | -- | --
SSD300 VGG16 | 25.1 | 41.5 | 26.2
SSDlite320 MobileNetV3-Large | 21.3 | 34.3 | 22.1
[STABLE] Quantized kernels for object detection
The forward pass of the nms and roi_align operators now support tensors with a quantized dtype, which can help lowering the memory footprint of object detection models, particularly on mobile environments.
[BETA] JPEG decoding on the GPU
Decoding jpegs is now possible on GPUs with the use of [nvjpeg](https://developer.nvidia.com/nvjpeg), which should be readily available in your CUDA setup. The decoding time of a single image should be about 2 to 3 times faster than with libjpeg on CPU. While the resulting tensor will be stored on the GPU device, the input raw tensor still needs to reside on the host (CPU), because the first stages of the decoding process take place on the host:
python
from torchvision.io.image import read_file, decode_jpeg
data = read_file('path_to_image.jpg') raw data is on CPU
img = decode_jpeg(data, device='cuda') decoded image in on GPU
[BETA] iOS support
TorchVision 0.10 now provides pre-compiled iOS binaries for its C++ operators, which means you can run Faster R-CNN and Mask R-CNN on iOS. An example app on how to build a program leveraging those ops can be found in [here](https://github.com/pytorch/vision/tree/master/ios/VisionTestApp).
[STABLE] Speed optimizations for Tensor transforms
The resize and flip transforms have been optimized and its runtime improved by up to 5x on the CPU. The corresponding PRs were sent to PyTorch in https://github.com/pytorch/pytorch/pull/51653, https://github.com/pytorch/pytorch/pull/54500 and https://github.com/pytorch/pytorch/pull/56713
[STABLE] Documentation improvements
Significant improvements were made to the documentation. In particular, a new gallery of examples is available: see [here](https://pytorch.org/vision/master/auto_examples/index.html) for the latest version (the stable version is not released at the time of writing). These examples visually illustrate how each transform acts on an image, and also properly documents and illustrate the output of the segmentation models.
The example gallery will be extended in the future to provide more comprehensive examples and serve as a reference for common torchvision tasks.
Backwards Incompatible Changes
* [transforms] Ensure input type of `normalize` is float. (3621)
* [models] Use PyTorch `smooth_l1_loss` and remove private custom implementation (3539)
New Features
* Added iOS binaries and test app (3582)(3629) (3806)
* [datasets] Added KITTI dataset (3640)
* [utils] Added utility to draw segmentation masks (3330, 3824)
* [models] Added the SSD & SSDlite object detection models (3403, 3757, 3766, 3855, 3896, 3818, 3799)
* [transforms] Added `antialias` option to `transforms.functional.resize` (3761, 3810, 3842)
* [transforms] Add new `max_size` parameter to `Resize` (3494)
* [io] Support for decoding jpegs on GPU with `nvjpeg` (3792)
* [ci, rocm] Add ROCm to builds (3840) (3604) (3575)
* [ops, models.quantization] Add quantized version of NMS (3601)
* [ops, models.quantization] Add quantized version of RoIAlign (3624, 3904)
Improvement
* [build] Various build improvements: (3618) (3622) (3399) (3794) (3561)
* [ci] Various CI improvements (3647) (3609) (3635) (3599) (3778) (3636) (3809) (3625) (3764) (3679) (3869) (3871) (3444) (3445) (3480) (3768) (3919) (3641)(3900)
* [datasets] Improve error handling in `make_dataset` (3496)
* [datasets] Remove caching from MNIST and variants (3420)
* [datasets] Make `DatasetFolder.find_classes` public (3628)
* [datasets] Separate extraction and decompression logic in `datasets.utils.extract_archive` (3443)
* [datasets, tests] Improve dataset test coverage and infrastructure (3450) (3457) (3454) (3447) (3489) (3661) (3458 (3705) (3411) (3461) (3465) (3543) (3550) (3665) (3464) (3595) (3466) (3468) (3467) (3486) (3736) (3730) (3731) (3477) (3589) (3503) (3423) (3492)(3578) (3605) (3448) (3864) (3544)
* [datasets, tests] Fix lazy importing for dataset tests (3481)
* [datasets, tests] Fix `test_extract(zip|tar|tar_xz|gzip)` on windows (3542)
* [datasets, tests] Fix `kwargs` forwarding in fake data utility functions (3459)
* [datasets, tests] Properly fix dataset test that passes by accident (3434)
* [documentation] Improve the documentation infrastructure (3868) (3724) (3834) (3689) (3700) (3513) (3671) (3490) (3660) (3594)
* [documentation] Various documentation improvements (3793) (3715) (3727) (3838) (3701) (3923) (3643) (3537) (3691) (3453) (3437) (3732) (3683) (3853) (3684) (3576) (3739) (3530) (3586) (3744) (3645) (3694) (3584) (3615) (3693) (3706) (3646) (3780) (3704) (3774) (3634)(3591)(3807)(3663)
* [documentation, ci] Improve the CI infrastructure for documentation (3734) (3837) (3796) (3711)
* [io] remove deprecated function calls (3859) (3858)
* [documentation, io] Improve IO docs and expose `ImageReadMode` in `torchvision.io` (3812)
* [onnx, models] Replace `reshape` with `flatten` in MobileNetV2 (3462)
* [ops, tests] Added test for `aligned=True` (3540)
* [ops, tests] Add onnx test for `batched_nms` (3483)
* [tests] Various test improvements (3548) (3422) (3435) (3860) (3479) (3721) (3872) (3908) (2916) (3917) (3920) (3579)
* [transforms] add `__repr__` for `transforms.RandomErasing` (3491)
* [transforms, documentation] Adds Documentation for AutoAugmentation (3529)
* [transforms, documentation] Add illustrations of transforms with sphinx-gallery (3652)
* [datasets] Remove pandas dependency for CelebA dataset (3656, 3698)
* [documentation] Add docs for missing datasets (3536)
* [referencescripts] Make reference scripts compatible with `submitit` (3785)
* [referencescripts] Updated `all_gather()` to make use of `all_gather_object()` from PyTorch (3857)
* [datasets] Added dataset download support in fbcode (3823) (3826)
Code quality
* Remove inconsistent FB copyright headers (3741)
* Keep consistency in classes `ConvBNActivation` (3750)
* Removed unused imports (3738, 3740, 3639)
* Fixed `floor_divide` deprecation warnings seen in pytest output (3672)
* Unify onnx and JIT `resize` implementations (3654)
* Cleaned-up imports in test files related to datasets (3720)
* [documentation] Remove old css file (3839)
* [ci] Fix inconsistent version pinning across yaml files (3790)
* [datasets] Remove redundant `path.join` in `Places365` (3545)
* [datasets] Remove imprecise error handling in `PhotoTour` dataset (3488)
* [datasets, tests] Remove obsolete `test_datasets_transforms.py` (3867)
* [models] Making protected params of MobileNetV3 public (3828)
* [models] Make target argument in `transform.py` truly optional (3866)
* [models] Adding some references on MobileNetV3 implementation. (3850)
* [models] Refactored `set_cell_anchors()` in `AnchorGenerator` (3755)
* [ops] Minor cleanup of `roi_align_forward_kernel_impl` (3619)
* [ops] Replace deprecated `AutoNonVariableTypeMode` with `AutoDispatchBelowADInplaceOrView`. (3786, 3897)
* [tests] Port tests to use pytest (3852, 3845, 3697, 3907, 3749)
* [ops, tests] simplify `get_script_fn` (3541)
* [tests] Use torch.testing.assert_close in out test suite (3886) (3885) (3883) (3882) (3881) (3887) (3880) (3878) (3877) (3875) (3888) (3874) (3884) (3876) (3879) (3873)
* [tests] Clean up test accept behaviour (3759)
* [tests] Remove unused `masks` variable in `test_image.py` (3910)
* [transforms] use ternary if in `resize` (3533)
* [transforms] replaced deprecated call to `ByteTensor` with `from_numpy` (3813)
* [transforms] Remove unnecessary casting in `adjust_gamma` (3472)
Bugfixes
* [ci] set empty cxx flags as default (3474)
* [android][test_app] Cleanup duplicate dependency (3428)
* Remove leftover exception (3717)
* Corrected spelling in a `TypeError` (3659)
* Add missing device info. (3651)
* Moving tensors to the right device (3870)
* Proper error message (3725)
* [ci, io] Pin JPEG version to resolve the size_t issue on windows (3787)
* [datasets] Make LSUN OS agnostic (3455)
* [datasets] Update `squeezenet` urls (3581)
* [datasets] Add `.item()` to the `target` variable in `fakedataset.py` (3587)
* [datasets] Fix VOC datasets for 2007 (3572)
* [datasets] Add custom user agent for download_url (3498)
* [datasets] Fix LSUN dataset tests flakyness (3703)
* [datasets] Fix (Fashion|K)MNIST download and MNIST download test (3557)
* [datasets] fix check for exceeded quota on Google Drive (3710)
* [datasets] Fix redirect behavior of datasets.utils.download_url (3564)
* [datasets] Update EMNIST url (3567)
* [datasets] Redirect datasets to correct urls (3574)
* [datasets] Prevent potential bug in `DatasetFolder.make_dataset` (3733)
* [datasets, tests] Fix redirection in download tests (3568)
* [documentation] Correct the size of returned tensor in comments of `ps_roi_pool.py` and `ps_roi_align.py` (3849)
* [io] Fix ternary operator to decide to store an image in Grayscale or RGB (3553)
* [io] Fixed audio-video synchronisation problem in `read_video()` when using `pts` as unit (3791)
* [models] Fix bug on detection backbones when `trainable_layers == 0` (3906)
* [models] Removed caching of anchors from `AnchorGenerator` (3745)
* [models] Update weights of classification models with new serialization format to allow proper unpickling (3620, 3851)
* [onnx, ops] Fix `roi_align` ONNX export (3355)
* [referencescripts] Only sync cuda ifn cuda available (3674)
* [referencescripts] Add checkpoints used for preemption. (3789)
* [transforms] Fix `to_tensor` for `accimage` backend (3439)
* [transforms] Make `crop` work the same for PIL and Tensor (3770)
* [transforms, models, tests] Fix some tests in fbcode (3686)
* [transforms, tests] Fix `test_random_autocontrast` flakyness (3699)
* [utils] Fix the spacing of labels on `draw_bounding_boxes` (3895)
* [utils, tests] Fix `test_draw_boxes` (3631)
Deprecation
* [transforms] Deprecate `_transforms_video` and `_functional_video` in favor of `transforms` (3441)
Performance
* [ops] Improve performance of `batched_nms` when number of boxes is large (3426)
* [transforms] Speed up `equalize` transform by using `bincount` instead of `histc` (3493)
Contributors
We're grateful for our community, which helps us improving torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release:
Aditya Oke, Akshay Kumar, Alessandro Melis, Avijit Dasgupta, Bruno Korbar, Caroline Chen, chengjuzhou, Edgar Andrés Margffoy Tuay, Eli Uriegas, Francisco Massa, Guillem Orellana Trullols, harishsdev, Ivan Kobzarev, Jaesun Park, James Thewlis, Jeff Daily, Jeff Yang, Jithendra Paruchuri, Jon Janzen, KAI ZHAO, Ksenija Stanojevic, Lewis Patten, Matti Picus, moto, Mustafa Bal, Nicolas Hug, Nikhil Kumar, Nikita Shulga, Philip Meier, Prabhat Roy, Sanket Thakur, scott-vsi, Sofiane Abbar, t-rutten, urmi22, Vasilis Vryniotis, vfdev, Yuchen Huang, Zhengyang Feng, Zhiqiang Wang
Thank you!