Torchvision

Latest version: v0.21.0

Safety actively analyzes 723144 Python packages for vulnerabilities to keep your Python projects secure.

Page 8 of 23

89.404

Torchscript support for torchvision.ops

torchvision ops are now natively supported by torchscript. This includes operators such as nms, roi_align and roi_pool, and for the ops that support backpropagation, both eager and torchscript modes are supported in autograd.

New operators

Deformable Convolution (1586) (1660) (1637)

As described in Deformable Convolutional Networks (https://arxiv.org/abs/1703.06211), torchvision now supports deformable convolutions. The model expects as input both the input as well as the offsets, and can be used as follows:
python
from torchvision import ops

module = ops.DeformConv2d(in_channels=1, out_channels=1, kernel_size=3, padding=1)
x = torch.rand(1, 1, 10, 10)

number of channels for offset should be a multiple
of 2 * module.weight.size[2] * module.weight.size[3], which correspond
to the kernel_size
offset = torch.rand(1, 2 * 3 * 3, 10, 10)

the output requires both the input and the offsets
out = module(x, offset)

If needed, the user can create their own wrapper module that imposes constraints on the offset. Here is an example, using a single convolution layer to compute the offset:

python
class BasicDeformConv2d(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size=1, stride=1,
dilation=1, groups=1, offset_groups=1):
super().__init__()
offset_channels = 2 * kernel_size * kernel_size
self.conv2d_offset = nn.Conv2d(
in_channels,
offset_channels * offset_groups,
kernel_size=3,
stride=stride,
padding=dilation,
dilation=dilation,
)
self.conv2d = ops.DeformConv2d(
in_channels,
out_channels,
kernel_size=kernel_size,
stride=stride,
padding=dilation,
dilation=dilation,
groups=groups,
bias=False
)

def forward(self, x):
offset = self.conv2d_offset(x)
return self.conv2d(x, offset)

Position-sensitive RoI Pool / Align (1410)

Position-Sensitive Region of Interest (RoI) Align operator mentioned in Light-Head R-CNN (https://arxiv.org/abs/1711.07264). These are available under ops.ps_roi_align, ps_roi_pool and the module equivalents ops.PSRoIAlign and ops.PSRoIPool, and have the same interface as RoIAlign / RoIPool.

New Features

TorchScript support

* Bugfix in BalancedPositiveNegativeSampler introduced during torchscript support (1670)
* Make R-CNN models less verbose in script mode (1671)
* Minor torchscript fixes for Mask R-CNN (1639)
* remove BC-breaking changes (1560)
* Make maskrcnn scriptable (1407)
* Add Script Support for Video Resnet Models (1393)
* fix ASPPPooling (1575)
* Test that torchhub models are scriptable (1242)
* Make Googlnet & InceptionNet scriptable (1349)
* Make fcn_resnet Scriptable (1352)
* Make Densenet Scriptable (1342)
* make resnext scriptable (1343)
* make shufflenet and resnet scriptable (1270)

ONNX

* Enable KeypointRCNN test (1673)
* enable mask rcnn test (1613)
* Changes to Enable KeypointRCNN ONNX Export (1593)
* Disable Profiling in Failing Test (1585)
* Enable ONNX Test for FasterRcnn (1555)
* Support Exporting Mask Rcnn to ONNX (1461)
* Lahaidar/export faster rcnn (1401)
* Support Exporting RPN to ONNX (1329)
* Support Exporting MultiScaleRoiAlign to ONNX (1324)
* Support Exporting GeneralizedRCNNTransform to ONNX (1325)

Quantization

* Update quantized shufflenet weights (1715)
* Add commands to run quantized model with pretrained weights (1547)
* Quantizable googlenet, inceptionv3 and shufflenetv2 models (1503)
* Quantizable resnet and mobilenet models (1471)
* Remove model download from test_quantized_models (1526)

Improvements

Bugfixes

* Bugfix on GroupedBatchSampler for corner case where there are not enough examples in a category to form a batch (1677)
* Fix rpn memory leak and dataType errors. (1657)
* Fix torchvision install due to zippeg egg (1536)

Transforms

* Make shear operation area preserving (1529)
* PILLOW_VERSION deprecation updates (1501)
* Adds optional fill colour to rotate (1280)

Ops

* Add Deformable Convolution operation. (1586) (1660) (1637)
* Fix inconsistent NMS implementation between CPU and CUDA (1556)
* Speed up nms_cuda (1704)
* Implementation for Position-sensitive ROI Pool/Align (1410)
* Remove cpp extensions in favor of torch ops (1348)
* Make custom ops differentiable (1314)
* Fix Windows build in Torchvision Custom op Registration (1320)
* Revert "Register Torchvision Ops as Cutom Ops (1267)" (1316)
* Register Torchvision Ops as Cutom Ops (1267)
* Use Tensor.data_ptr instead of .data (1262)
* Fix header includes for cpu (1644)

Datasets

* fixed test for windows by closing the created temporary files (1662)
* VideoClips windows fixes (1661)
* Fix VOC on Windows (1641)
* update dead LSUN link (1626)
* DatasetFolder should follow links when searching for data (1580)
* add .tgz support to extract_archive (1650)
* expose audio_channels as a parameter to kinetics dataset (1559)
* Implemented integrity check (md5 hash) after dataset download (1456)
* Move VideoClips dummy dataset to top level for pickling (1649)
* Remove download for ImageNet (1457)
* add tar.xz archive handler (1361)
* Fix DeprecationWarning for collections.Iterable import in LSUN (1417)
* Support empty target_type for CelebA dataset (1351)
* VOC2007 support test set (1340)
* Fix EMNSIT download URL (1297) (1318)
* Refactored clip_sampler (1562)

Documentation

* Fix documentation for NMS (1614)
* More examples of functional transforms (1402)
* Fixed doc of crop functionals (1388)
* Added Training Sample code for fasterrcnn_resnet50_fpn (1695)
* Fix rpn.py typo (1276)
* Update README with minimum required version of PyTorch (1272)
* fix alignment of README (1396)
* fixed typo in DatasetFolder and ImageFolder (1284)

Models

* Bugfix for MNASNet (1224)
* Fix anchor dtype in AnchorGenerator (1341)

Utils

* Adding File object option to utils.save_image (1301)
* Fix make_grid: support any number of channels in tensor (1300)
* Fix bug of changing input tensor in utils.save_image (1244)

Reference scripts

* add a README for training object detection models (1612)
* Adding args for names of train and val directories (1544)
* Fix broken bitwise operation in Similarity Reference loss (1604)
* Fixing issue 1530 by starting ann_id to 1 in convert_to_coco_api (1531)
* Add commands for model training (1203)
* adding documentation for automatic mixed precision training (1533)
* Fix reference training script for Mask R-CNN for PyTorch 1.2 (during evaluation after epoch, mask datatype became bool, pycocotools expects uint8) (1413)
* fix a little bug about resume (1628)
* Better explain lr and batch size in references/detection/train.py (1233)
* update default parameters in references/detection (1611)
* Removed code redundancy/refactored inn video_classification (1549)
* Fix comment in default arguments in references/detection (1243)

Tests

* Correctness test implemented with old test architecture (1511)
* Simplify and organize test_ops. (1551)
* Replace asserts with assertEqual (1488)(1499)(1497)(1496)(1498)(1494)(1487)(1495)
* Add expected result tests (1377)
* Add TorchHub tests to torchvision (1319)
* Scriptability checks for Tensor Transforms (1690)
* Add tests for results in script vs eager mode (1430)
* Test for checking non mutating behaviour of tensor transforms (1656)
* Disable download tests for Python2 (1269)
* Fix randomresized params flaky (1282)

CI

* Disable C++ models from being compiled without explicit request (1535)
* Fix discrepancy in regenerate.py (1583)
* soumith -> pytorch for docker images (1577)
* [wip] try vs2019 toolchain (1509)
* Make CI use PyTorch nightly (1492)
* Try enabling Windows CUDA CI (1486)
* Fix CUDA builds on Windows (1485)
* Try fix Windows CircleCI (1433)
* Fix CUDA CI (1464)
* Change approach for rebase to master (1427)
* Temporary fix for CI (1411)
* Use PyTorch 1.3 for CI (1467)
* Use links from S3 to install CUDA (1472)
* Enable CUDA 9.2 builds for Windows (1381)
* Fix nightly builds (1374)
* Fix Windows CI after 1301 (1368)
* Retry `anaconda login` for Windows builds (1366)
* Fix nightly wheels builds for Windows (1358)
* Fix CI for py2.7 cu100 wheels (1354)
* Fix Windows CI (1347)
* Windows build scripts (1241)
* Make CircleCI checkout merge commit (1344)
* use native python code generation logic (1321)
* Add CircleCI (v2) (1298)

88.882

87.582

87.404

Object Detection
We provide two variants of Faster R-CNN with MobileNetV3 backbone pre-trained on COCO train2017. They can be obtained as follows
python
import torch
import torchvision

Fast Low Resolution Model
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
m_detector = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_320_fpn(pretrained=True)
m_detector.eval()
predictions = m_detector(x)

Highly Accurate High Resolution Model
x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)]
m_detector = torchvision.models.detection.fasterrcnn_mobilenet_v3_large_fpn(pretrained=True)
m_detector.eval()
predictions = m_detector(x)

And yield the following accuracies on COCO val 2017 (full results available in 3265):

| Model | mAP | mAP50 | mAP75 |
| --- | --- | --- | --- |
| Faster R-CNN MobileNetV3-Large 320 FPN | 22.8 | 38.0 | 23.2 |
| Faster R-CNN MobileNetV3-Large FPN | 32.8 | 52.5 | 34.3 |

Semantic Segmentation
We also provide pre-trained models for semantic segmentation. The models have been trained on a subset of COCO train2017, which contains the same 20 categories as those from Pascal VOC.
python
import torch
import torchvision

Fast Mobile Model
x = torch.rand(1, 3, 520, 520)
m_segmenter = torchvision.models.segmentation.lraspp_mobilenet_v3_large(pretrained=True)
m_segmenter.eval()
predictions = m_segmenter(x)

Highly Accurate Mobile Model
x = torch.rand(1, 3, 520, 520)
m_segmenter = torchvision.models.segmentation.deeplabv3_mobilenet_v3_large(pretrained=True)
m_segmenter.eval()
predictions = m_segmenter(x)

The pre-trained models give the following results on the subset of COCO val2017 which contain the same 20 categories as those present in Pascal VOC (full results in 3276):

| Model | mean IoU | global pixelwise accuracy |
| --- | --- | --- |
| Lite R-ASPP with Dilated MobileNetV3 Large Backbone | 57.9 | 91.2 |
| DeepLabV3 with Dilated MobileNetV3 Large Backbone | 60.3 | 91.2 |

Addition of the AutoAugment method

[AutoAugment](https://arxiv.org/pdf/1805.09501.pdf) is a common Data Augmentation technique that can improve the accuracy of Scene Classification models. Though the data augmentation policies are directly linked to their trained dataset, empirical studies show that ImageNet policies provide significant improvements when applied to other datasets.

In TorchVision we implemented 3 policies learned on the following datasets: ImageNet, CIFA10 and SVHN. The new transform can be used standalone or mixed-and-matched with existing transforms:
python
from torchvision import transforms

t = transforms.AutoAugment()
transformed = t(image)

transform=transforms.Compose([
transforms.Resize(256),
transforms.AutoAugment(),
transforms.ToTensor()])

Improved Image IO and on-the-fly image type conversions

All the read and decode methods of the `io.image` package have been updated to:

* Add support for Palette, Grayscale Alpha and RBG Alpha image types during PNG decoding.
* Allow the on-the-fly conversion of image from one type to the other during read.

python
from torchvision.io.image import read_image, ImageReadMode

keeps original type, channels unchanged
x1 = read_image("image.png")

converts to grayscale, channels = 1
x2 = read_image("image.png", mode=ImageReadMode.GRAY)

converts to grayscale with alpha transparency, channels = 2
x3 = read_image("image.png", mode=ImageReadMode.GRAY_ALPHA)

coverts to RGB, channels = 3
x4 = read_image("image.png", mode=ImageReadMode.RGB)

converts to RGB with alpha transparency, channels = 4
x5 = read_image("image.png", mode=ImageReadMode.RGB_ALPHA)

Python 3.9 and CUDA 11.1
This release adds official support for Python 3.9 and CUDA 11.1 (3341, 3418)

Backwards Incompatible Changes

* [Ops] Change default `eps` value of `FrozenBN` to better align with `nn.BatchNorm` (2933)
* [Ops] Remove deprecated _new_empty_tensor. (3156)
* [Transforms] `ColorJitter` gets its random params by calling `get_params()` (3001)
* [Transforms] Change rounding of transforms on integer tensors (2964)
* [Utils] Remove `normalize` from `save_image` (3324)

New Features

* [Datasets] Add WiderFace dataset (2883)
* [Models] Add MobileNetV3 architecture:
* Classification Models: (3354, 3252, 3182, 3242, 3177)
* Object Detection Models: (3265, 3253, 3223, 3243, 3244, 3248)
* Segmentation Models: (3276)
* Quantized Models: (3366, 3323)
* [Models] Improve speed/accuracy of FasterRCNN by introducing a score threshold on RPN (3205)
* [Mobile] Add Android gradle project with demo test app (2897)
* [Transforms] Implemented AutoAugment, along with required new transforms + Policies (3123)
* [Ops] Added support of Autocast in all Operators: 2938, 2926, 2922, 2928, 2905, 2906, 2907, 2898
* [Ops] Add modulation input for DeformConv2D (2791)
* [IO] Improved `io.image` with on-the-fly image type conversions: (3193, 3069, 3024, 2988, 2984)
* [IO] Add option to write audio to video file (2304)
* [Utils] Added a utility to draw bounding boxes (2785, 3296, 3075)

Improvements

Datasets

* Concatenate small tensors in video datasets to reduce the use of shared file descriptor (1795)
* Improve testing for datasets (3336, 3337, 3402, 3412, 3413, 3415, 3416, 3345, 3376, 3346, 3338)
* Check if dataset file is located on Google Drive before downloading it (3245)
* Improve Coco implementation (3417)
* Make download_url follow redirects (3236)
* `make_dataset` as `staticmethod` of `DatasetFolder` (3215)
* Add a warning if any clip can't be obtained from a video in `VideoClips`. (2513)

Models

* Improve error message in `AnchorGenerator` (2960)
* Disable pretrained backbone downloading if pretrained is True in segmentation models (3325)
* Support for image with no annotations in RetinaNet (3032)
* Change RoIHeads reshape to support empty batches. (3031)
* Fixed typing exception throwing issues with JIT (3029)
* Replace deprecated `functional.sigmoid` with `torch.sigmoid` in RetinaNet (3307)
* Assert that inputs are floating point in Faster R-CNN normalize method (3266)
* Speedup RetinaNet's postprocessing (2828)

Ops

* Added eps in the `__repr__` of FrozenBN (2852)
* Added `__repr__` to `MultiScaleRoIAlign` (2840)
* Exposing LevelMapper params in `MultiScaleRoIAlign` (3151)
* Enable autocast for all operators and let them use the dispatcher (2926, 2922, 2928, 2898)

Transforms

* `adjust_hue` now accepts tensors with one channel (3222)
* Add `fill` color support for tensor affine transforms (2904)
* Remove torchscript workaround for `center_crop` (3118)
* Improved error message for `RandomCrop` (2816)

IO

* Enabling to import `read_file` and the other methods from torchvision.io (2918)
* accept python bytes in `_read_video_from_memory()` (3347)
* Enable rtmp timeout in decoder (3076)
* Specify tls cert file to decoder through config (3289, 3374)
* Add UUID in LOG() in decoder (3080)

References

* Add weight averaging and storing methods in references utils (3352)
* Adding Preset Transforms in reference scripts (3317)
* Load variables when `--resume /path/to/checkpoint --test-only` (3285)
* Updated video classification ref example with new transforms (2935)

Misc

* Various documentation improvements (3039, 3271, 2820, 2808, 3131, 3062, 3061, 3000, 3299, 3400, 2899, 2901, 2908, 2851, 2909, 3005, 2821, 2957, 3360, 3019, 3124, 3217, 2879, 3234, 3180, 3425, 2979, 2935, 3298, 3268, 3203, 3290, 3295, 3200, 2663, 3153, 3147, 3232)
* The documentation infrastructure was improved, in particular the docs are now built on every PR and uploaded to CircleCI (3259, 3378, 3408, 3373, 3290)
* Avoid some deprecation warnings from PyTorch (3348)
* Ensure operators are added in C++ (2798, 3091, 3391)
* Fixed compilation warnings on C++ codebase (3390)
* CI Improvements (3401, 3329, 2990, 2978, 3189, 3230, 3254, 2844, 2872, 2825, 3144, 3137, 2827, 2848, 2914, 3419, 2895, 2837)
* Installation improvements (3302, 2969, 3113, 3202)
* CMake improvements (2801, 2805, 3212, 3381)

Mobile

* Add Torch Selective macros in all C++ Ops for better support on mobile (3218)

Code Quality, testing

* [BC-breaking] Modernized C++ codebase & made it mobile-friendly (25% faster to compile): 2885, 2891, 2892, 2893, 2905, 2906, 2907, 2938, 2944, 2945, 3011, 3020, 3097, 3105, 3134, 3135, 3143, 3146, 3154, 3156, 3163, 3218, 3308, 3311, 3312, 3326, 3350, 3390
* Cleaned up Python codebase & made it more Pythonic: 3263, 3239, 3059, 3055, 3045, 3382, 3159, 3171
* Improve type annotations (3288, 3045, 2862, 2858, 2857, 2863, 2865, 2856, 2860, 2864, 2875, 2859, 2854, 2861, 3174, 3059)
* Code refactoring and static analysis improvements (3379, 3335, 3229, 3204, 3095)
* Miscellaneous test improvements (2966, 2965, 3018, 3035, 2961, 2806, 2812, 2815, 2834, 2874, 3099, 3092, 3160, 3103, 2971, 3023, 2803, *3136*, 3319, 3310, 3287, 3033, 2983, 3386, 3369, 3116, 2985, 3320)

Bug Fixes

* [DATASETS] Fixes EMNIST split and label issues (2673)
* [DATASETS] Fix overflow in STL10 fold reading (3353)
* [MODELS] Fix incorrectly frozen BN on ResNet FPN backbone (3396)
* [MODELS] Fix scriptability support in Inception V3 (2976)
* [MODELS] Changed default value of eps in FrozenBatchNorm to match BatchNorm: 2940 2933
* [MODELS] Fixed warning in `models.detection.transforms.resize_image_and_masks`. (3237)
* [MODELS] Fix trainable_layers on RetinaNet (3234)
* [MODELS] Fix ShuffleNetV2 ONNX model export issue. (3158)
* [UTILS] Fixes no grad and range bugs in utils. (3269)
* [UTILS] make_grid uses a more correct normalization (2967)
* [OPS] fix GET_THREADS() for ROCm with DeformConv (2997)
* [OPS] Fix NMS and IoU overflows for fp16 (3383, 3382)
* [OPS] Fix ops registration on windows (3380)
* [OPS] Fix initialisation bug on FeaturePyramidNetwork (2954)
* [IO] Replace hardcoded error code with ENODATA (3277)
* [REFERENCES] Fix repeated UserWarning and add more flexibility to reference code for segmentation tasks (2886)
* [TRANSFORMS] Fix default fill value in RandomRotation (3303)
* [TRANSFORMS] Correct aspect ratio sampling in transforms.RandomErasing (3344)
* [TRANSFORMS] Fix `CenterCrop` for Tensor size is greater than `imgsize` (3333)
* [TRANSFORMS] Functional to_tensor returns float tensor of default dtype (3398)
* [TRANSFORMS] Add explicit check for number of channels (3013)
* [TRANSFORMS] `pil_to_tensor` with accimage backend now return uint8 (3109)
* [TRANSFORMS] Fix potential overflow in `convert_image_dtype` (3107)
* [TRANSFORMS] Check num of channels on `adjust*_` transformations (3069)

Deprecations

* [TRANSFORMS] Introduced InterpolationModes and deprecated arguments: `resample` and `fillcolor` (2952, 3055)

84.112

</td>
<td>
<ul>

83.712

</td>
<td>
<ul>

Page 8 of 23

Releases

Has known vulnerabilities

Previous Next

Torchvision

Page 8 of 23

89.404

88.882

87.582

87.404

84.112

83.712

Page 8 of 23

Links

Releases