Torchvision

Latest version: v0.21.0

Safety actively analyzes 723929 Python packages for vulnerabilities to keep your Python projects secure.

Page 21 of 23

0.8.0

This release brings new additions to torchvision that improves support for model deployment. Most notably, transforms in torchvision are now torchscript-compatible, and can thus be serialized together with your model for simpler deployment. Additionally, we provide native image IO with torchscript support, and a new video reading API (released as Beta) which is more flexible than `torchvision.io.read_video`.

Highlights

Transforms now support Tensor, batch computation, GPU and TorchScript

torchvision transforms are now inherited from nn.Module and can be torchscripted and applied on torch Tensor inputs as well as on PIL images. They also support Tensors with batch dimension and work seamlessly on CPU/GPU devices:
python
import torch
import torchvision.transforms as T

to fix random seed, use torch.manual_seed
instead of random.seed
torch.manual_seed(12)

transforms = torch.nn.Sequential(
T.RandomCrop(224),
T.RandomHorizontalFlip(p=0.3),
T.ConvertImageDtype(torch.float),
T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
)
scripted_transforms = torch.jit.script(transforms)
Note: we can similarly use T.Compose to define transforms
transforms = T.Compose([...]) and
scripted_transforms = torch.jit.script(torch.nn.Sequential(*transforms.transforms))

tensor_image = torch.randint(0, 256, size=(3, 256, 256), dtype=torch.uint8)
works directly on Tensors
out_image1 = transforms(tensor_image)
on the GPU
out_image1_cuda = transforms(tensor_image.cuda())
with batches
batched_image = torch.randint(0, 256, size=(4, 3, 256, 256), dtype=torch.uint8)
out_image_batched = transforms(batched_image)
and has torchscript support
out_image2 = scripted_transforms(tensor_image)

These improvements enable the following new features:

* support for GPU acceleration
* batched transformations e.g. as needed for videos
* transform multi-band torch tensor images (with more than 3-4 channels)
* torchscript transforms together with your model for deployment

**Note: Exceptions for TorchScript support includes `Compose`, `RandomChoice`, `RandomOrder`, `Lambda` and those applied on PIL images, such as `ToPILImage`.**

Native image IO for JPEG and PNG formats

torchvision 0.8.0 introduces native image reading and writing operations for JPEG and PNG formats. Those operators support TorchScript and return `CxHxW` tensors in `uint8` format, and can thus be now part of your model for deployment in C++ environments.

python
from torchvision.io import read_image

tensor_image is a CxHxW uint8 Tensor
tensor_image = read_image('path_to_image.jpeg')

or equivalently
from torchvision.io.image import read_file, decode_image
raw_data is a 1d uint8 Tensor with the raw bytes
raw_data = read_file('path_to_image.jpeg')
tensor_image = decode_image(raw_data)

all operators are torchscriptable and can be
serialized together with your model torchscript code
scripted_read_image = torch.jit.script(read_image)

New detection model

This release adds a pretrained model for RetinaNet with a ResNet50 backbone from [Focal Loss for Dense Object Detection](https://arxiv.org/abs/1708.02002), with the following accuracies on COCO val2017:

IoU metric: bbox
Average Precision (AP) [ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.364
Average Precision (AP) [ IoU=0.50 | area= all | maxDets=100 ] = 0.558
Average Precision (AP) [ IoU=0.75 | area= all | maxDets=100 ] = 0.383
Average Precision (AP) [ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.193
Average Precision (AP) [ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.400
Average Precision (AP) [ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.490
Average Recall (AR) [ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.315
Average Recall (AR) [ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.506
Average Recall (AR) [ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.558
Average Recall (AR) [ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.386
Average Recall (AR) [ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.595
Average Recall (AR) [ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.699

[BETA] New Video Reader API

This release introduces a new video reading abstraction, which gives more fine-grained control on how to iterate over the videos. It supports image and audio, and implements an iterator interface so that it can be combined with the rest of the python ecosystem, such as `itertools`.

python
from torchvision.io import VideoReader

stream indicates if reading from audio or video
reader = VideoReader('path_to_video.mp4', stream='video')
can change the stream after construction
via reader.set_current_stream

to read all frames in a video starting at 2 seconds
for frame in reader.seek(2):
frame is a dict with "data" and "pts" metadata
print(frame["data"], frame["pts"])

because reader is an iterator you can combine it with
itertools
from itertools import takewhile, islice
read 10 frames starting from 2 seconds
for frame in islice(reader.seek(2), 10):
pass

or to return all frames between 2 and 5 seconds
for frame in takewhile(lambda x: x["pts"] < 5, reader.seek(2)):
pass

**Note: In order to use the Video Reader API, you need to compile torchvision from source and make sure that you have ffmpeg installed in your system.**
**Note: the VideoReader API is currently released as beta and its API can change following user feedback.**

Backwards Incompatible Changes

* [Transforms] Random seed now should be set with `torch.manual_seed` instead of `random.seed` (2292)
* [Transforms] `RandomErasing.get_params` function’s argument was previously `value=0` and is now `value=None` which is interpreted as Gaussian random noise (2386)
* [Transforms] `RandomPerspective` and `F.perspective` changed the default value of interpolation to be `BILINEAR` instead of `BICUBIC` (2558, 2561)
* [Transforms] Fixes incoherence in `affine` transformation when center is defined as half image size + 0.5 (2468)

New Features

* [Ops] Added focal loss (2784)
* [Ops] Added bounding boxes conversion function (2710, 2737)
* [Ops] Added Generalized IOU (2642)
* [Models] Added RetinaNet object detection model (2784)
* [Datasets] Added Places365 dataset (2610, 2625)
* [Transforms] Added GaussianBlur transform (2658)
* [Transforms] Added torchscript, batch and GPU and tensor support for transforms (2769, 2767, 2749, 2755, 2485, 2721, 2645, 2694, 2584, 2661, 2566, 2345, 2342, 2356, 2368, 2373, 2496, 2553, 2495, 2561, 2518, 2478, 2459, 2444, 2396, 2401, 2394, 2586, 2371, 2477, 2456, 2628, 2569, 2639, 2620, 2595, 2456, 2403, 2729)
* [Transforms] Added example notebook for tensor transforms (2730)
* [IO] Added JPEG/PNG encoding / decoding ops
* JPEG (2388, 2471, 2696, 2725)
* PNG (2382, 2726, 2398, 2457, 2735)
* decode_image (2680, 2695, 2718, 2764, 2766)
* [IO] Added file reading / writing ops (2728, 2765, 2768)
* [IO] [BETA] Added new VideoReader API (2683, 2781, 2778, 2802, 2596, 2612, 2734, 2770)

Improvements

Datasets

* Added error message if Google Drive download quota is exceeded (2321)
* Optimized LSUN initialization time by only pulling keys from db (2544)
* Use more precise return type for gzip.open() (2792)
* Added UCF101 dataset tests (2548)
* Added download tests on a schedule (2665, 2675, 2699, 2706, 2747, 2731)
* Added typehints for datasets (2487, 2521, 2522, 2523, 2524, 2526, 2528, 2529, 2525, 2527, 2530, 2533, 2534, 2535, 2536, 2532, 2538, 2537, 2539, 2531, 2540, 2667)

Models

* Removed hard coded value in DeepLabV3 (2793)
* Changed the anchor generator default argument to an equivalent one (2722)
* Moved model construction location in `resnet_fpn_backbone` into after docstring (2482)
* Partially enabled type hints for models (2668)

Ops

* Moved RoIs shape check to C++ (2794)
* Use autocast built-in cast-helper functions (2646)
* Adde type annotations for `torchvision.ops` (2331, 2462)

References

* [References] Removed redundant target send to device in detection evaluation (2503)
* [References] Removed obsolete import in segmentation. (2399)

Misc

* [Transforms] Added support for negative padding in `pad` (2744)
* [IO] Added type hints for `torchvision.io` (2543)
* [ONNX] Export `ROIAlign` with `aligned=True` (2613)

Internal

* [Binaries] Added CUDA 11 binary builds (2671)
* [Binaries] Added DEBUG=1 option to build torchvision (2603)
* [Binaries] Unpin ninja version (2358)
* Warn if torchvision imported from repo root (2759)
* Added compatibility checks for C++ extensions (2467)
* Added probot (2448)
* Added ipynb to git attributes file (2772)
* CI improvements (2328, 2346, 2374, 2437, 2465, 2579, 2577, 2633, 2640, 2727, 2754, 2674, 2678)
* CMakeList improvements (2739, 2684, 2626, 2585, 2587)
* Documentation improvements (2659, 2615, 2614, 2542, 2685, 2507, 2760, 2550, 2656, 2723, 2601, 2654, 2757, 2592, 2606)

Bug Fixes

* [Ops] Fixed crash in deformable convolutions (2604)
* [Ops] Added empty batch support for `DeformConv2d` (2782)
* [Transforms] Enforced contiguous output in `to_tensor` (2483)
* [Transforms] Fixed fill parameter for PIL pad (2515)
* [Models] Fixed deprecation warning in `nonzero` for R-CNN models (2705)
* [IO] Explicitly cast to `size_t` in video decoder (2389)
* [ONNX] Fixed dynamic resize in Mask R-CNN (2488)
* [C++ API] Fixed function signatures for `torch::nn::Functional` (2463)

Deprecations

* [Transforms] Deprecated dedicated implementations `functional_tensor` of `F_t.center_crop`, `F_t.five_crop`, `F_t.ten_crop`, as they can be implemented as a function of `crop` (2568)
* [Transforms] Deprecated explicit usage of `F_pil` and `F_t` functions, users should instead use the general functional API (2664)

0.7.0

Highlights

Mixed precision support for all models
torchvision models now support mixed-precision training via the new `torch.cuda.amp` package. Using mixed precision support is easy: just wrap the model and the loss inside a `torch.cuda.amp.autocast` context manager. Here is an example with Faster R-CNN:

python
import torch, torchvision

device = torch.device('cuda')

model = torchvision.models.detection.fasterrcnn_resnet50_fpn()
model.to(device)

input = [torch.rand(3, 300, 400, device=device)]
boxes = torch.rand((5, 4), dtype=torch.float32, device=device)
boxes[:, 2:] += boxes[:, :2]
target = [{"boxes": boxes,
"labels": torch.zeros(5, dtype=torch.int64, device=device),
"image_id": 4,
"area": torch.zeros(5, dtype=torch.float32, device=device),
"iscrowd": torch.zeros((5,), dtype=torch.int64, device=device)}]

use automatic mixed precision
with torch.cuda.amp.autocast():
loss_dict = model(input, target)
losses = sum(loss for loss in loss_dict.values())
perform backward outside of autocast context manager
losses.backward()

New pre-trained segmentation models

This releases adds pre-trained weights for the ResNet50 variants of Fully-Convolutional Networks (FCN) and DeepLabV3.
They are available under `torchvision.models.segmentation`, and can be obtained as follows:
python
torchvision.models.segmentation.fcn_resnet50(pretrained=True)
torchvision.models.segmentation.deeplabv3_resnet50(pretrained=True)

They obtain the following accuracies:
Network | mean IoU | global pixelwise acc
-- | -- | --

0.6.1

Highlights

* Bump pinned PyTorch version to [`v1.5.1`](https://github.com/pytorch/pytorch/releases/tag/v1.5.1)

0.6.0

This release is the first one that officially drops support for Python 2.
It contains a number of improvements and bugfixes.

Highlights

Faster/Mask/Keypoint RCNN supports negative samples

It is now possible to feed training images to Faster / Mask / Keypoint R-CNN that do not contain any positive annotations.
This enables increasing the number of negative samples during training. For those images, the annotations expect a tensor with 0 in the number of objects dimension, as follows:
python
target = {"boxes": torch.zeros((0, 4), dtype=torch.float32),
"labels": torch.zeros(0, dtype=torch.int64),
"image_id": 4,
"area": torch.zeros(0, dtype=torch.float32),
"masks": torch.zeros((0, image_height, image_width), dtype=torch.uint8),
"keypoints": torch.zeros((17, 0, 3), dtype=torch.float32),
"iscrowd": torch.zeros((0,), dtype=torch.int64)}

Aligned flag for RoIAlign

`RoIAlign` now supports the aligned flag, which aligns more precisely two neighboring pixel indices.

Refactored abstractions for C++ video decoder

This change is transparent to Python users, but the whole C++ backend for video reading (which needs torchvision to be compiled from source for it to be enabled for now) has been refactored into more modular abstractions.
The core abstractions are in https://github.com/pytorch/vision/tree/master/torchvision/csrc/cpu/decoder, and the video reader functions exposed to Python, by leveraging those abstractions, can be written in [a much more concise way](https://github.com/pytorch/vision/tree/master/torchvision/csrc/cpu/video_reader)

Backwards Incompatible Changes

* Dropping Python2 support (1761, 1792, 1984, 1976, 2037, 2033, 2017)
* [Models] Fix inception quantized pre-trained model (1954, 1969, 1975)
* ONNX support for Mask R-CNN and Keypoint R-CNN has been temporarily dropped, but will be fixed in next releases

New Features

* [Transforms] Add Perspective fill option (1973)
* [Ops] `aligned` flag in ROIAlign (1908)
* [IO] Update video reader to use new decoder (1978)
* [IO] torchscriptable functions for video io (1653, 1794)
* [Models] Support negative samples in Faster R-CNN, Mask R-CNN and Keypoint R-CNN (1911, 2069)

Improvements

Datasets

* STL10: don't check integrity twice when download=True (1787)
* Improve code readability and docstring of video datasets(2020)
* [DOC] Fixed typo in Cityscapes docs (1851)

Transforms

* Allow passing list to the input argument 'scale' of RandomResizedCrop (1997) (2008)
* F.normalize unsqueeze mean & std only for 1-d arrays (2002)
* Improved error messages for transforms.functional.normalize(). (1915)
* generalize number of bands calculation in to_tensor (1781)
* Replace 2 transpose ops with 1 permute in ToTensor(2018)
* Fixed Pillow version check for Pillow >= 10 (2039)
* [DOC]: Improve transforms.Normalize docs (1784, 1858)
* [DOC] Fixed missing new line in transforms.Crop docstring (1922)

Ops

* Check boxes shape in RoIPool / Align (1968)
* [ONNX] Export new_empty_tensor (1733)
* Fix Tensor::data<> deprecation. (2028)
* Fix deprecation warnings (2055)

Models

* Add warning and note docs for scipy (1842) (1966)
* Added __repr__ attribute to GeneralizedRCNNTransform (1834)
* Replace mean on dimensions 2,3 by adaptive_avg_pooling2d in mobilenet (1838)
* Add init_weights keyword argument to Inception3 (1832)
* Add device to torch.tensor. (1979)
* ONNX export for variable input sizes in Faster R-CNN (1840)
* [JIT] Cleanup torchscript constant annotations (1721, 1923, 1907, 1727)
* [JIT] use // now that it is supported (1658)
* [JIT] add torch.jit.script to ImageList (1919)
* [DOC] Improved docs for Faster R-CNN (1886, 1868, 1768, 1763)
* [DOC] add comments for the modified implementation of ResNet (1983)
* [DOC] Add comments to AnchorGenerator (1941)
* [DOC] Add comment in GoogleNet (1932)

Documentation

* Document int8 quantization model (1951)
* Update Doc with ONNX support (1752)
* Update README to reflect strict dependency on torch==1.4.0 (1767)
* Update sphinx theme (2031)
* Document origin of preprocessing mean / std (1965)
* Fix docstring formatting issues (2049)

Reference scripts

* Add return statement in evaluate function of detection reference script (2029)
* [DOC]Add default training parameters to classification reference README (1998)
* [DOC] Add README to references/segmentation (1864)

Tests

* Improve stability of test_nms_cuda (2044)
* [ONNX] Disable model tests since export of interpolate script module is broken (1989)
* Skip inception v3 in test/test_quantized_models (1885)
* [LINT] Small indentation fix (1831)

Misc

* Remove unintentional -O0 option in setup.py (1770)
* Create CODE_OF_CONDUCT.md
* Update issue templates (1913, 1914)
* master version bump 0.5 → 0.6
* replace torch 1.5.0 items flagged with deprecation warnings (fix 1906) (1918)
* CUDA_SUFFIX → PYTORCH_VERSION_SUFFIX

CI

* Remove av from the binary requirements (2006)
* ci: Add cu102 to CI and packaging, remove cu100 (1980)
* .circleci: Switch to use token for conda uploads (1960)
* Improvements to CI infra (2051, 2032, 2046, 1735, 2048, 1789, 1731, 1961)
* typing only needed for python 3.5 and previous (1778)
* Move C++ and Python linter to CircleCI (2056, 2057)

Bug Fixes

Datasets

* bug fix on downloading voc2007 test dataset (1991)
* fix lsun docstring example (1935)
* Fixes EMNIST classes attribute is wrong 1716 (1736)
* Force object annotation to be a list in VOC (1790)

Models

* Fix for AnchorGenerator when device switch happen (1745)
* [JIT] fix len error (1981)
* [JIT] fix googlenet no aux logits (1949)
* [JIT] Fix quantized googlenet (1974)

Transforms

* Fix for rotate fill with Images of type F (1828)
* Fix fill in rotate (1760)

Ops

* Fix bug in DeformConv2d for batch sizes > 32 (2027, 2040)
* Fix for roi_align ONNX export (1988)
* Fix torchscript issue in ConvTranspose2d (1917)
* Fix interpolate when no scale_factor is passed (1785)
* Fix Windows build by renaming Python init functions (1779)
* fix for loading models with num_batches_tracked in frozen bn (1728)

Deprecations

* the pts_unit of pts from read_video and read_video_timestamp is deprecated, and will be replaced in next releases with seconds.

0.5.0

This release brings several new additions to torchvision that improves support for deployment. Most notably, all models in torchvision are torchscript-compatible, and can be exported to ONNX. Additionally, a few classification models have quantized weights.

**Note: this is the last version of torchvision that officially supports Python 2.**

Breaking changes

Updated KeypointRCNN pre-trained weights

The pre-trained weights for keypointrcnn_resnet50_fpn have been updated and now correspond to the results reported in the documentation. The previous weights corresponded to an intermediate training checkpoint. (1609)

Corrected the implementation for MNASNet

The previous implementation contained a bug which affects all MNASNet variants other than mnasnet1_0. The bug was that the first few layers needed to also be scaled in terms of width multiplier, along with all the rest. We now provide a new checkpoint for mnasnet0_5, which gives 32.17 top1 error. (1224)

Highlights

TorchScript support for all models

All models in torchvision have native support for torchscript, for both training and testing. This includes complex models such as DeepLabV3, Mask R-CNN and Keypoint R-CNN.
Using torchscript with torchvision models is easy:
python
get a pre-trained model
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)

convert to torchscript
model_script = torch.jit.script(model)
model_script.eval()

compute predictions
predictions = model_script([torch.rand(3, 300, 300)])

**Warning: the return type for the scripted version of Faster R-CNN, Mask R-CNN and Keypoint R-CNN is different from its eager counterpart, and it always returns a tuple of losses, detections. This discrepancy will be addressed in a future release.**

ONNX

All models in torchvision can now be exported to ONNX for deployment. This includes models such as Mask R-CNN.
python
get a pre-trained model
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
model.eval()
inputs = [torch.rand(3, 300, 300)]
predictions = model(inputs)

convert to ONNX
torch.onnx.export(model, inputs, "model.onnx",
do_constant_folding=True,
opset_version=11 opset_version 11 required for Mask R-CNN
)

**Warning: for Faster R-CNN / Mask R-CNN / Keypoint R-CNN, the current exported model is dependent on the input shape during export. As such, make sure that once the model has been exported to ONNX that all images that are fed to it have the same shape as the shape used to export the model to ONNX. This behavior will be made more general in a future release.**

Quantized models

torchvision now provides quantized models for ResNet, ResNext, MobileNetV2, GoogleNet, InceptionV3 and ShuffleNetV2, as well as reference scripts for quantizing your own model in references/classification/train_quantization.py (https://github.com/pytorch/vision/blob/master/references/classification/train_quantization.py). Obtaining a pre-trained quantized model can be obtained with a few lines of code:
python
model = torchvision.models.quantization.mobilenet_v2(pretrained=True, quantize=True)
model.eval()

run the model with quantized inputs and weights
out = model(torch.rand(1, 3, 224, 224))

We provide pre-trained quantized weights for the following models:

| Model | Acc1 | Acc5 |
| --- | --- | --- |

0.4.2

This minor release introduces an optimized `video_reader` backend for torchvision. It is implemented in C++, and uses FFmpeg internally.

The new `video_reader` backend can be up to 6 times faster compared to the `pyav` backend.
- When decoding all video/audio frames in the video, the new `video_reader` is 1.2x - 6x faster depending on the codec and video length.
- When decoding a fixed number of video frames (e.g. [4, 8, 16, 32, 64, 128]), `video_reader` runs equally fast for small values (i.e. [4, 8, 16]) and runs up to 3x faster for large values (e.g. [32, 64, 128]).

Using the optimized video backend

Switching to the new backend can be done via `torchvision.set_video_backend('video_reader')` function. By default, we use a backend based on top of [PyAV](https://github.com/mikeboers/PyAV).

Due to packaging issues with FFmpeg, in order to use the `video_reader` backend one need to first have `ffmpeg` available on the system, and then compile torchvision from source using the instructions from https://github.com/pytorch/vision#installation

Deprecations
In torchvision 0.4.0, the `read_video` and `read_video_timestamps` functions used `pts` relative to the video stream. This could lead to unaligned video-audio being returned in some cases.

torchvision now allow to specify a `pts_unit` argument in those functions. The default value is `'pts'` (with same behavior as before), and the user can now specify `pts_unit='sec'`, which produces consistently aligned results for both video and audio. The `'pts'` value is deprecated for now, and kept for backwards-compatibility.

In the next release, the default value of `pts_unit` will change to `'sec'`, so that calling `read_video` without specifying `pts_unit` returns consistently aligned audio-video results. This will require users to update their `VideoClips` checkpoints, which used to store the information in `pts` by default.

Changelog
- [video reader] inception commit (1303) 31fad34
- Expose frame-rate and cache to video datasets (1356) 85ffd93
- Expose num_workers in VideoClips (1359) 02a8c0a
- Fix randomresized params flaky (1282) 7c9bbf5
- Video transforms (1353) 64917bc
- add _backend argument to init() of class VideoClips (1363) 7874374
- Video clips workers (1369) 0982395
- modified code of io.read_video and io.read_video_timestamps to intepret pts values in seconds (1331) 17e355f
- add metadata to video dataset classes. bug fix. more robustness (1376) 49b01e3
- move sampler into TV core. Update UniformClipSampler (1408) f0d3daa
- remove hardcoded video extension in kinetics400 dataset (1418) 929c81d
- Fix hmdb51 and ucf101 typo (1420) b13931a
- fix a bug related to audio_end_pts (1431) 1258bb7
- expose more io api (1423) e48b958
- Make video transforms private (1429) 79daca1
- extend video reader to support fast video probing (1437) ed5b2dc
- Better handle corrupted videos (1463) da89dad
- Temporary fix to remove ffmpeg from build time (1475) ed04dee
- fix a bug when video decoding fails and empty frames are returned (1506) 2804c12
- extend DistributedSampler to support group_size (1512) 355e9d2
- Unify video backend (1514) 97b53f9
- Unify video metadata in VideoClips (1527) 7d509c5
- Fixed compute_clips docstring (1543) b438d32

Page 21 of 23

Releases

Has known vulnerabilities

Previous Next

Torchvision

Page 21 of 23

0.8.0

0.7.0

0.6.1

0.6.0

0.5.0

0.4.2

Page 21 of 23

Links

Releases