Training an Object Detection Model

With a subscription at the individual level or above you will have access to the alwaysAI model training tools. The tools are built into the CLI, so once you are up and running with the platform there is no need to install anything else. This guide provides details on the specific CLI commands to run model training.


Dataset Help

Before starting this process make sure that you have the dataset you plan on using for training annotated and ready to go. If you need tips of generating a dataset please see the documentation on the data generation starter app, or our documentation outlining dataset capture guidelines.

Datasets consist of two folders, ‘Annotations’, which holds all annotation files, and ‘JPEGImages’, which holds all of the corresponding images; these two folders are zipped together to create one dataset file. For the annotations, please make sure they are in the Pascal VOC xml format. This format is described here.

Example Usage

When running model retraining, there is a lot of configuration that needs to take place. We have simplified the process so that this configuration is done from within the command. This can result in lengthy commands, but we will break down the structure in the rest of the document.

Here is an example of a command to train a model on two datasets, dataset1 and dataset2, both of which .zip files. The labels the model will be trained on are “license_plate” and “vehicle,” and the model will be trained for 200 steps with a batch size of 16. This command also overrides the defaults for train/validation and training, using a train/validation ratio of 0.8 instead of the default 0.7, and trains using the Jupyter Notebook instead of the CLI:

$ aai dataset train --labels license_plate vehicle -- numSteps 200 --id alwaysai/id --batchSize 8 --continue-from-version 1.0 --trainValRatio 0.8 --jupyter

Base Subcommand

This is the base alwaysAI command. We will build flags on to this.

$ aai dataset train ...

Specify the Dataset(s)

This is required as the second half of the base command. It specifies the dataset(s) to be used for training. It consists of the path to each dataset to be used, separated by spaces. If more than one dataset is used, the training tool will merge them before running in order to validate the format and disambiguate naming.

$ ... <path/to/> <path/to/> ...

Define the Label(s)

The labels flag is where you tell the training tool what labels you are training the model to recognize. Each label that is identified in your dataset must be included here.

$ ... --labels <label1> <label2> <label3> ...

Set the Number of Steps to Train

This is fairly self explanatory. The number of steps is an integer value specifying how many steps to train the model for. One thing to take into consideration is that the number of steps does not equal the number of epochs. Another thing to note is that if you use the --continue-from-training flag, you will need to input the total number of steps you wish the model to be trained on, not just the steps for that iteration. You can read more detail in the document on model training or in the FAQ, but in short, the calculation is (number of images/batch size) * (desired epochs) = numSteps.

$ ... --numSteps <steps> ...

Define the Model ID

The Model ID gives a unique identifier to your model. It consists of your alwaysAI username plus whatever you want to name your model. Find your username on your profile page.

$ ... --id <username/modelname> ...

The model it will stored in the model cache under this id, and from there can add this to a starter app or any alwaysAI application the same way you do any model. See our post training guide on for more details.

Configure Training Hardware

Model training can be run either on your CPU or GPU, if you have one. GPUs are much better at parallel processing and can perform the deep learning tasks required for training a model much quicker, so it is likely that if you have a GPU you will want to use it. To do that, first make sure your system is compatible (has a GPU) and set up. Then set the hardware tag to GPU.

$ ... --hardware [GPU, CPU] ...

Continue from Previous Training

If you feel your model needs more training, you have the ability to continue from where you left off. Identify a version from your cache directory or continue from the most recent checkpoint. Make sure to update numSteps to be the total steps when using this flag.

$ ... --continue-from-version <version> ...

Set the Batch Size

Batch size, specified with an integer determines the number of training images that the model analyzes results before updating the model parameters. The limiting factor of batch size will be the amount of memory of your training machine. Because GPUs are better at parallel processing, if you are using a GPU you will be able to handle larger batch sizes.

$ ... --batchSize <size> ...

Set the Train/Validation Ratio

This number sets the ratio of training images to validation images. This takes the total number of images in your dataset and divides them into images used for training the model, and images used for validating that training. This number must be between 0 and 1. The default value is 0.7, meaning 70% of the images are dedicated for training, and 30% for validation. Setting this number to 0.8 means 80% for training and 20% for validation, etc.

$ ... --trainValRatio <ratio> ...

Run Training Through Jupyter Notebooks

This flag changes the way that training is run. If this flag is included, training will be run using a Jupyter Notebook instead of the CLI. Please see the Jupyter Notebook documentation for more information.

$ ... --jupyter ...