Torch Vision Models and Pretrained Weights for Classification Tasks

The torchvision.models subpackage contains definitions of models for addressing different tasks, including: image classification, pixelwise semantic segmentation, object detection, instance segmentation, person keypoint detection, video classification, and optical flow.

For the CLASSIFICATION TASKS, the following classification models are available, with or without pre-trained weights:

Convolution Neural Network - CNN

EfficientNetV2 - EfficientNetV2: Smaller Models and Faster Training - 2021

Authors: Mingxing Tan, Quoc V. Le.
Architecture

The maximum inference image size to 480.

Model accuracy, inference time and maximum frame rate

  • EfficientNetV2-S - 83.9% - 24ms infer time - 42 fps max
  • EfficientNetV2-M 85.1% - 57ms infer time - 17 fps max
  • EfficientNetV2-L 85.7% - 98ms infer time - 10 fps max

MobilenetV3 - Searching for MobileNetV3 - 2019

Authors: Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam

Other CNNs

ViT - Vision Trarnsformer

References

Open Positions
Lets have a brainstorming
session about your business
How about a 15 mins Conference Call
Submit
More About Dystill Vision around the Internet.