This release includes several significant new features, bug fixes and tutorials
New additions include the following,
Models
- [Multi-Scale Vision Transformers (MViT)](https://arxiv.org/abs/2104.11227) along with it's [model builders](https://github.com/facebookresearch/pytorchvideo/blob/master/pytorchvideo/models/vision_transformers.py#L97) and associated [pre-trained models in the model zoo](https://github.com/facebookresearch/pytorchvideo/blob/master/pytorchvideo/models/hub/vision_transformers.py).
MViT is a new state of the art vision transformer model that beats the existing baselines all while requiring lesser compute resources.
- [Audio Visual SlowFast Model](https://github.com/facebookresearch/pytorchvideo/blob/master/pytorchvideo/models/audio_visual_slowfast.py) - This would enable you to work with Audio and Video Modalities simultaneously
- [Video action detection Resnet model](https://github.com/facebookresearch/pytorchvideo/blob/master/pytorchvideo/models/resnet.py#L831) and associated [pre-trained models in the model zoo](https://github.com/facebookresearch/pytorchvideo/blob/master/pytorchvideo/models/hub/resnet.py#L73).
Transform
- [AugMix](https://github.com/facebookresearch/pytorchvideo/blob/master/pytorchvideo/transforms/augmix.py)
- [MixUp and CutMix](https://github.com/facebookresearch/pytorchvideo/blob/master/pytorchvideo/transforms/mix.py)
- [and more ](https://github.com/facebookresearch/pytorchvideo/blob/master/pytorchvideo/transforms/transforms.py)
Simply, adding AugMix and MixUp transforms to your existing training recipes should boost your models baseline accuracy.
Datasets
- [Ava dataset](https://github.com/facebookresearch/pytorchvideo/blob/master/pytorchvideo/data/ava.py) and it's associated [benchmarks](https://github.com/facebookresearch/pytorchvideo/blob/master/docs/source/model_zoo.md) and [tutorials ](https://pytorchvideo.org/docs/tutorial_torchhub_detection_inference)