Data Collection

This page outlines how to run the data collection starter application and also provides some helpful tips for dataset generation.

Data Generation Starter Application

Generating Data is the first step in model building, and it can be accomplished in many ways. You can find more information about the different ways to gather data the Data Capture Guidelines document. The Data Generation starter app simplifies data collection and allows you to capture data directly from any platform that you can run alwaysAI on, including the edge device and camera pairing on which you be inferencing. This guide will walk you through the following steps:

  1. Set Up the Project on alwaysAI

  2. Get The Sample App Code

  3. Configure the Local Directory

  4. Build and Start the Application

  5. (Optional) Configuration Options

1. Set Up the Project on alwaysAI

If you’d prefer to configure your app via the CLI, proceed to the next step.

Otherwise, navigate to https://alwaysai.co/ and login, then proceed to your Dashboard.

i. Click ‘create new project’.

ii. Select ‘start from scratch’ and give your project a name. Then click ‘next’ to proceed.

iii. Do not choose to add a model. Finish creating your project and click on the link to your created project on the subsequent page.

See this page for the complete documentation on setting up projects in the dashboard.

2. Get the Sample App Code

i. Either clone or download the repository found on GitHub into a local directory where you’d like your app to run. This will be your project directory.

3. Configure the Local Directory

i. On your command line, navigate into the directory where the project code is located on your local computer.

ii. If you did not create your project on the dashboard, simply enter aai app configure in the terminal and follow the prompts to create a new project. Otherwise, if you did create your project via the dashboard, you can enter aai app configure and use the arrows to find your project, or you can simply use the aai app configure --project <hash> command provided on your project page in the Dashboard.

iii. The tool will automatically detect the alwaysai.app.json and app.py and should write a alwaysai.project.json.

4. Build and Start the Application

Enter into the command line

$ aai app install

And subsequently enter

$ aai app start

Open a browser to localhost:5000 to start collecting data! Once you’re done, click the red square in the browser window to stop the app.

Your data will be in your local directory, in a folder called samples.

5. Configuration Options

To set the sample rate of the data collection, find the “SAMPLE_RATE” variable near the top of the app.py file. The value of this variable will determine the number of frames captured per second. For example:

SAMPLE_RATE = 5

denotes that 5 frames per second will be used, whereas

SAMPLE_RATE = .25

denotes ¼ of a frame per second, or 1 frame every 4 seconds, will be used.

Note: You will not be able to capture more frames per second than your hardware is capable of capturing.

This is a simple data collection app, for more advanced sampling we recommend using third party software tools such as FFmpeg.

Generate Images from Video

Sometimes you will have data from other sources that is in video format. Currently, you must process the video into images before uploading it to Dataset Management. We recommend installing FFmpeg to use for this pre-processing (find details on this at the end of the document). Here are some examples of actions you can take to make annotation smoother.

Sample Images from a Video

As you know, a movie is just a large number of images displayed in order. This means that the difference between each frame is not very big. We need a lot of examples of images to train a robust model, but it doesn’t help us if they are all the same, or very similar images. In order to ensure variance in the training dataset we want to ensure the sampling rate chosen (e.g. 1 image/1s, 1 image/5s) is at a rate that captures the target items in multiple positions yet are not overly similar.

Split a Video into Multiple Parts

When collecting data, you may find that the objects you wish to detect are interspersed randomly throughout the video. Since you are not interested in the parts of the movie that don’t contain an object or objects that you want to identify, you can split the movie into parts that contain objects and parts that don’t, and only keep the parts you want.

Concatenate Videos

When collecting data, you will likely end up with multiple movies. For large datasets this can get unwieldy, so concatenation can keep things simple.

See the list of tools at the end of this document for additional details on processing and sampling videos.

Additional Guides

FFmpeg

We recommend you download and install ffmpeg to assist you in generating your dataset. It is great for generating sample images from a wide variety of video formats, or changing the format of a video. There are detailed installation instructions on the website provided, and many sites with command instructions and examples. e.g. https://www.labnol.org/internet/useful-ffmpeg-commands/28490/

A sample command that gives you high quality samples with 2 frames per second is:

$ ffmpeg -i movie_name.mov -r 2 -q:v 1 image_name_%4d.png

Data Capture Guidelines

Capturing data is the first step in the model training process, and one of the most important. A model is only as good as the dataset it is trained on; as the saying goes: garbage in, garbage out. Keeping that in mind, we have compiled a list of considerations that will help you to ensure the data you capture gives you the best chance at an accurate, robust model.

There are three ways to generate a dataset:

  1. Collect the dataset yourself

  2. Acquire data from outside sources

  3. Use a digitally generated dataset.

The scope of this document is for the first method, however, the considerations we discuss apply for the other ways of generating a dataset as well.

Data Source

You can collect your dataset in either image or video format. The main, overarching theme to keep in mind is: Collect as you will inference. In computer vision, “inference” is the term we use for applying a trained model to an input to infer an outcome. If your target application will be analyzing random images from the internet, your dataset should be images pulled from the internet. If your target application will be running on security camera video footage collected from a camera in a high corner of a building lobby, then your data should be from a similar, or preferably the same, video camera. Model training is done on images, and inference is technically done on images as well, even if it is analyzing a video stream, however the concept is to train on data that resembles real-world applications. Once your video data is collected, can easily sample the videos to create images to use for training. While you may be able to find a ready made dataset, or generate one using images or video collected by someone else, if you have full control of the source of your data, you can ensure a better quality dataset.

Class/Label Balance

The term label balance refers to aiming to have roughly equivalent number of example images for each class, or label, as they are often called, you are training your model to recognize. If there is a large discrepancy in the number of images across classes, e.g. you are training a model to recognize bottles and cans and have 2000 images of bottles and 50 images of cans, the model will not be balanced. This could result in disparity in accuracy or precision across classes and will generally be detrimental to your model.

Lighting

The optimal lighting for your dataset depends on the lighting of your target application. To return to the security camera example, the lighting will vary greatly depending on whether your security camera is inside or outside, or if it is running during the night, the day, or both. A camera that is inside may have consistent lighting throughout the day, and even the night, whereas a camera outside is subject to changes in lighting due to things like weather and time of day. None of these things are an impediment, however, you’ll want to take them into consideration, and ideally have examples of the all the lighting conditions your model will be exposed to in your training data.

Angle

The angle that objects are viewed from can drastically change their shape. An umbrella from above or below is an octagon, but from the side it is a crescent with a line. When collecting data, consider the angle you will be inferencing from, whether your camera will be high or low, as well as the direction that your targets will be crossing the frame, if it is relevant.

Distance

When humans looks at a scene, our brains perform a lot of processing to interpret whether an object is close to us or far away. The main factor in this is size: the closer an object is to us, the bigger it appears. We need to take that into consideration when training a computer vision model. In order to teach the model to recognize an object consistently regardless of how close it is to the camera, we need images of the object from a wide range of distances in our dataset. That means we need images where our target class takes up most of the frame, as well as images where the target class takes up very little of the frame. Try to capture the target object from a variety of distances, especially if the object will be moving towards or away from the camera in the target application.

Resolution

Resolution will play a role in the quality of the model if there is a large discrepancy between the resolution of the training images and inferencing images. For example, if the training images are high-definition, the model will have trouble finding the same shape in grainy, low resolution images. Typically, resolution is close enough across most devices, however it is good to take this into consideration in general.

Scale

Most likely, the framework on which you train the model will re-scale all images so that they are consistent for training. However, if the raw images that you gather for training have a wide range of scales, this re-scaling will affect all of them differently, which will have a negative impact on your model. Try to use training images that are at roughly the same scale.

Occlusion

Your occlusion tolerance is something that you have to make a decision on when gathering data. How much of an object do you want to be visible before your model detects it? 50%? 80%? 20%? Keep in mind, if you want a partially visible object to be detected by your model, you need a large number of examples of the object being occluded in your dataset. In addition, the more an object is occluded, the less defining features are able to be detected, and as such, you may introduce false positives, or reduce the accuracy of inferences if you use images containing occluded objects as your training data.

Weather

If you are going to be inferencing in a location that has weather, i.e. outside, try to account for that when gathering data. If all the data you collect is during a bright sunny day, what happens when it rains or snows? The clouds will reduce light, the rain will add an artifact over the entire inference area that needs to be accounted for. Snow will completely change the background, etc. Think about whether you will be inferencing in various weather conditions, and try to incorporate that as best you can into your data collection. You may not be able to make it rain when you are capturing data, but maybe you could simulate it by using images from time with varying amounts of light, like early morning or evening.

Background

The background of your training dataset can drastically change how your target classes are recognized. If you collect all your data in a controlled environment, say with a white background, it will be easy for the model to recognize the objects you are training on, but this won’t translate to an accurate model in the real world. In the real world, your object may be camouflaged by the background, or have half blend in and half stand out, or any number of situations. To generate a robust model that is accurate in many situations, vary the background of the training dataset as much as possible.

Foreground

What happens if your target class is in the background and there are other things in the foreground? It will affect the focus of your hardware, the clarity of your object, and the overall visibility of your target. This is a very likely situation when you deploy your model in the real world. There is no guarantee that what you are training for will be front and center in your image, so try to include images that have things other than your target class as the foreground.