Torch Vision Models and Pretrained Weights for Classification Tasks

The torchvision.models subpackage contains definitions of models for addressing different tasks, including: image classification, pixelwise semantic segmentation, object detection, instance segmentation, person keypoint detection, video classification, and optical flow.

For the CLASSIFICATION TASKS, the following classification models are available, with or without pre-trained weights:

Convolution Neural Network - CNN

EfficientNetV2 - EfficientNetV2: Smaller Models and Faster Training - 2021

Authors: Mingxing Tan, Quoc V. Le.
Architecture

The maximum inference image size to 480.

Model accuracy, inference time and maximum frame rate

  • EfficientNetV2-S - 83.9% - 24ms infer time - 42 fps max
  • EfficientNetV2-M 85.1% - 57ms infer time - 17 fps max
  • EfficientNetV2-L 85.7% - 98ms infer time - 10 fps max

MobilenetV3 - Searching for MobileNetV3 - 2019

Authors: Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam

Other CNNs

ViT - Vision Trarnsformer

References

Want to Receive Updates On Fastest AI Models, Successful AI Startups and New Hiring Candidates. Subscribe To My Newsletters
Subscribe