Torchvision

Latest version: v0.20.1

Safety actively analyzes 688087 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 14 of 23

74.864

72.154

37.0

Mask R-CNN ResNet-50 FPN | 37.9 | 34.6 |  
Keypoint R-CNN ResNet-50 FPN | 54.6 |   | 65.0

The implementations of the models for object detection, instance segmentation and keypoint detection are fast, specially during training.

In the following table, we use 8 V100 GPUs, with CUDA 10.0 and CUDNN 7.4 to report the results. During training, we use a batch size of 2 per GPU, and during testing a batch size of 1 is used.

For test time, we report the time for the model evaluation and post-processing (including mask pasting in image), but not the time for computing the precision-recall.

Network | train time (s / it) | test time (s / it) | memory (GB)
-- | -- | -- | --
Faster R-CNN ResNet-50 FPN | 0.2288 | 0.0590 | 5.2
Mask R-CNN ResNet-50 FPN | 0.2728 | 0.0903 | 5.4
Keypoint R-CNN ResNet-50 FPN | 0.3789 | 0.1242 | 6.8


You can load and use pre-trained detection and segmentation models with a few lines of code

python
import torchvision

model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
set it to evaluation mode, as the model behaves differently
during training and during evaluation
model.eval()

image = PIL.Image.open('/path/to/an/image.jpg')
image_tensor = torchvision.transforms.functional.to_tensor(image)

pass a list of (potentially different sized) tensors
to the model, in 0-1 range. The model will take care of
batching them together and normalizing
output = model([image_tensor])
output is a list of dict, containing the postprocessed predictions


Pixelwise Semantic Segmentation models

**Warning: The API is currently experimental and might change in future versions of torchvision**

The 0.3 release also contains models for dense pixelwise prediction on images.
It adds FCN and DeepLabV3 segmentation models, using a ResNet50 and ResNet101 backbones.
Pre-trained weights for ResNet101 backbone are available, and have been trained on a subset of COCO train2017, which contains the same 20 categories as those from Pascal VOC.

The pre-trained models give the following results on the subset of COCO val2017 which contain the same 20 categories as those present in Pascal VOC:

Network | mean IoU | global pixelwise acc
-- | -- | --

9.7

7.2

We would like to thank [_Ross Girshick_](https://github.com/rbgirshick), [_Piotr Dollar_](https://github.com/pdollar), [_Vaibhav Aggarwal_](https://github.com/vaibhava0), [_Francisco Massa_](https://github.com/fmassa) and [_Hu Ye_](https://github.com/xiaohu2015) for their past research and contributions to this work.

New pre-trained weights

SWAG weights

The ViT and RegNet model variants offer new pre-trained [_SWAG_](https://arxiv.org/abs/2201.08371) (Supervised Weakly from hashtAGs) weights. One of the biggest of these models achieves a whopping 88.6% accuracy on ImageNet-1K. We currently offer two versions of the weights: 1) fine-tuned end-to-end weights on ImageNet-1K (highest accuracy) and 2) frozen trunk weights with a linear classifier fit on ImageNet-1K (great for transfer learning). Below we see the detailed accuracies of each model variant:


Model Weights | Acc1 | Acc5
-- | -- | --

5.1

Page 14 of 23

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.