Computer Vision Models

The alwaysAI model catalog provides a set of pre-trained machine learning models that, when combined with the alwaysAI edgeIQ APIs, allows developers to quickly prototype a computer vision application without the need to first create and train a custom model. These pre-trained models cover image classification and object detection. To use a model within your application, simply select it using the alwaysAI CLI tool when configuring your application. Learn more about adding and changing models here.

Model Accuracy and Inference Times

Where available, accuracy information for a model is shown in mean Average Precision (mAP). Two mAP values are given based on how often an object is correctly predicted within the first predictions (top-1) or within the top five predictions (top-5) returned by the inference engine. For full information on mAP and how it is calculated see mAP (mean Average Precision) for Object Detection – Jonathan Hui – Medium.

Inference times for models measure how long it takes for the inference engine to process an image and return predictions. Inference times are given in seconds, with benchmarking done on the Inforce 6640. Information for the Inforce 6640 can be found here: The product-ready SBC powered by the latest Snapdragon™ 820 processor, designed for the most advanced embedded applications.

Classification Models

The goal of image classification models is to predict a set of labels that characterize an input image. These models are typically used in batch processing mode. An example of an edge application using classification would be an automated medical microscope with an embedded camera that could determine whether a sample slide had a disease or not. In this implementation, the classification model would be trained to recognize the disease.

Object Detection Models

Object detection builds on image classification, but this time allows for the localization of each object in an image. The image is now characterized by:

  • Bounding box (x, y)-coordinates for each object

  • An associated class label for each bounding box

Detection models can be used in either realtime or batch processing mode. An example of an edge application using object detection would be a smart security camera that could detect and count people in a given area.

Data Sets

The models in the catalog are based on one of three datasets; ImageNet (image classification), COCO (object detection), and Pascal VOC (object detection).

ImageNet Dataset

Models based on the ImageNet dataset can classify an image into 1,000 separate categories and are trained on a dataset consisting of more than 1.2 million images. A full list of ImageNet object categories can be found here.

COCO Dataset

The Common Object in Context (COCO) dataset is a large-scale object detection dataset consisting of 330,000 images. Models in the catalog are capable of identifying between 90 - 100 unique object categories depending on their training. More information on the COCO dataset can be found here.

Pascal VOC

Pascal Visual Object Classes (VOC) is an object detection dataset consisting of 11,530 images and capable of identifying 20 unique object classes (person, bird, cat, cow, dog, horse, sheep, aeroplane, bicycle, boat, bus, car, motorbike, train, bottle, chair, dining table, potted plant, sofa, tv/monitor). More information on the VOC dataset can be found here.