Mask R-CNN ResNet-50 FPN | 37.9 | 34.6 |
Keypoint R-CNN ResNet-50 FPN | 54.6 | | 65.0
The implementations of the models for object detection, instance segmentation and keypoint detection are fast, specially during training.
In the following table, we use 8 V100 GPUs, with CUDA 10.0 and CUDNN 7.4 to report the results. During training, we use a batch size of 2 per GPU, and during testing a batch size of 1 is used.
For test time, we report the time for the model evaluation and post-processing (including mask pasting in image), but not the time for computing the precision-recall.
Network | train time (s / it) | test time (s / it) | memory (GB)
-- | -- | -- | --
Faster R-CNN ResNet-50 FPN | 0.2288 | 0.0590 | 5.2
Mask R-CNN ResNet-50 FPN | 0.2728 | 0.0903 | 5.4
Keypoint R-CNN ResNet-50 FPN | 0.3789 | 0.1242 | 6.8
You can load and use pre-trained detection and segmentation models with a few lines of code
python
import torchvision
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
set it to evaluation mode, as the model behaves differently
during training and during evaluation
model.eval()
image = PIL.Image.open('/path/to/an/image.jpg')
image_tensor = torchvision.transforms.functional.to_tensor(image)
pass a list of (potentially different sized) tensors
to the model, in 0-1 range. The model will take care of
batching them together and normalizing
output = model([image_tensor])
output is a list of dict, containing the postprocessed predictions
Pixelwise Semantic Segmentation models
**Warning: The API is currently experimental and might change in future versions of torchvision**
The 0.3 release also contains models for dense pixelwise prediction on images.
It adds FCN and DeepLabV3 segmentation models, using a ResNet50 and ResNet101 backbones.
Pre-trained weights for ResNet101 backbone are available, and have been trained on a subset of COCO train2017, which contains the same 20 categories as those from Pascal VOC.
The pre-trained models give the following results on the subset of COCO val2017 which contain the same 20 categories as those present in Pascal VOC:
Network | mean IoU | global pixelwise acc
-- | -- | --