Sam Nasr: Azure Custom Vision Object Detection

Friday, September 17, 2021

Computer Vision and Custom Vision are 2 subsets of services provided by Azure Cognitive Services.

Computer Vision: Analyze content in images.

OCR: Optical Character Recognition
Image Analysis: extracts visual features from images (objects, faces)
Spatial Analysis: Analyzes the presence and movement of people on a video feed and produces events that other systems can respond to.

Custom Vision: Customize image recognition to fit your business needs.

Image Classification: applies label(s) to an image
Object Detection: returns coordinates in image where applied label(s) can be found.

When using the Object Detection Prediction API, the response returned from azure will be a JSON dataset using the following format.

public class PredictionResponse

{

public string id { get; set; }

public string project { get; set; }

public string iteration { get; set; }

public string created { get; set; }

public Prediction[] predictions { get; set; }

}

public class Prediction

{

public string tagId { get; set; }

public string tagName { get; set; }

public string probability { get; set; }

public BoundingBox boundingBox { get; set; }

}

public class BoundingBox

{

public string left { get; set; }

public string top { get; set; }

public string width { get; set; }

public string height { get; set; }

}

Each BoundingBox object in the response is represented graphically by the red boxes, as shown in the sample image below

In addition, listed below are some Gotcha’s to watch out for when working with Object Detection

Be sure to use the same login for https://customvision.ai as the one used for the Azure portal.

Use same "Directory" in CustomVision.ai and Azure portal. This setting can be found in the top right corner for both Azure portal and CustomVision.ai

When training the model, you must use a minimum of 15 images for every tag. More images with different lighting, angles, and backgrounds will produce better results.

The images types used for training must be .JPG, .PNG, or .BMP, and less than 4MB.

Sam Nasr