This page answers FAQs from four categories

Please visit our Discord channel or or send an email to support@alwaysai.co to ask questions that are not answered on our page. Please find helpful tutorials and additional reading material on our Blog page. General documentation, including FAQ on alwaysAI, can be found here.


How many steps should I train?

Rule of thumb a model should have at least 20 epochs, but with our current tool we are finding more success with 60 or 100 epochs or more. The number of epochs will be determined by both the batch size you choose as well as the number of steps. So, to determine the number of steps, you should first determine your expected batch size, and then how many steps it will take to reach at least 20 epochs. You can see how to perform this calculation below.

What is an epoch?

An epoch is running through every image 1 time.

What is batch size?

The batch size is the number of images processed per step. Batch size is largely dependent on how much memory you have available for training: the more memory you can use, the larger the possible batch size.

How do I calculate how many steps to run?

For every step, the model trains on the number of images defined in batchSize. So, a batch size of 8 equates to 8 images per step. We need to train on every image 20 times, which is 20 epochs, at a minimum (see definition of epoch). So if you have 1000 images and a batch size of 1, every 1000 steps is an epoch, so 20,000 steps = 20 epochs. If your batch size is 8, every 125 steps is an epoch, so 2500 steps = 20 epochs

(number of images/batch size) * (desired epochs) = numSteps

Note: You can break these steps up, or run additional steps, by utilizing the --continue-from-version flag in the training command (see the model training command documentation for additional details)

What does ‘overfitting’ mean?

When the model performs very well on the training dataset, but not on data that hasn’t been seen before, the model is overfit. This means that even though the performance metrics may appear very good for the training dataset, the model cannot be generalized to new data. For instance, if you have a model that you want to train to detect sporting equipment, and for the label ‘ball’ your dataset included only green tennis balls, even if the precision and recall were very high and loss is very low, the model probably won’t generalize to basketballs or baseballs, or maybe even non-green tennis balls. This is an extreme example; if you train any dataset too much, any model will learn that dataset so well that it doesn’t understand that new data may also be instances of the desired labels.

What is ‘loss’?

There are numerous algorithms to measure loss, and this measurement will be different for different machine learning tasks. In general, loss measures how far off the model was in correctly learning the task, and as such it is always a value that we want to minimize. There are two types of loss, training loss and validation loss. Training loss is measured by how accurately the model predicts using the training data. Validation loss measures how accurately the model predicted on validation data, which is annotated data that the model was not trained on.

What are ‘precision’ and ‘recall’?

Precision describes how many of the detected objects are what we actually wanted to detect. It is calculated by dividing the number of correctly identified objects, the true positives, by the total identified objects (both the true and false positives).

Recall describes how many of the objects of interest we managed to detect. It is calculated by dividing the true positives by the true positives plus the true negatives.

Say we have a model that is supposed to detect dogs, and in a picture there are three dogs and two cats. If the model detects all entities in the picture as dogs, it would have low precision, because only 3 of the 5 objects were what we wanted to detect. It would have high recall, however, because it managed to detect all the dogs. We want our model to have both high precision and high recall. This would mean we want a model to correctly identify dogs as dogs, and not identify any cats as dogs.

What is ‘data augmentation’?

It can sometimes be challenging to collect sufficient images to train your model. You can augment your dataset by taking the images you do have and creating additional images by rotating, cropping, brightening, darkening, blurring, etc. them. One Python library you can use to do this is imgaug.

How do I know when I should stop training?

Generally, you want to stop training when loss no longer decreases and mAP no longer increases. If you visually test your model, using it in an application, and you notice certain instances of labels are no longer being picked up, you may have overfit your model. If instead some objects are being mis-identified, your model may need more training.

Data Collection

What is the format for my training data?

Input for training is expected to be in Pascal VOC format. Images should be JPEGs and stored in a folder named ‘JPEGImages’. Annotations are in XML format, and should be stored in a folder named ‘Annotations’; you can see an example of this format in our Data Annotation guide. Every image should correspond to an annotation file, e.g. file ‘0.jpg’ corresponds to ‘0.xml’. Every dataset consists of the ‘Annotations’ and ‘JPEGImages’ folders zipped together. Zip the folders by selecting the individual ‘Annotations’ and ‘JPEGImages’ folders, not a parent directory.

What if I have multiple datasets, do I need to combine them before training?

No, the aai dataset train command will combine any input datasets provided they are in zip format. You can also merge datasets using aai dataset merge if you would like a consolidated input file, however the aai dataset train does this automatically.

How much input data do I need?

Approximately 300 images per label at a minimum is recommended, however more data will almost surely produce better results. See the Data Capture Guidelines document for more details.


Do I need to annotate all objects in an image?

No, however you should be careful to not include too many images containing objects of interest without annotations, and you should be mindful of what other objects are in your images. For more details on data collection and annotation, please refer to our blogs on these subjects.

How much of an object needs to be showing before I annotate it?

In general, about 20% of the image should be present to annotate it. Additionally, if more than 20% of the image is covered, you can mark the annotation as ‘truncated’.


What are the commands that are required for training?

The basic syntax for training is:

$ aai dataset train <dataset1.zip> <dataset2.zip> --labels <label1> <label2> --numSteps <integer> --batchSize <integer> --id <username>/<modelname>

An example usage is

$ aai dataset train resized_dataset_sample_592.zip --labels vehicle license_plate --numSteps 2000 --batchSize 4 --id user1/test_592 

Note: You can train on as many datasets as you’d like, as long as the aggregate file size is < 2GB.

What are optional flags for training?

The optional flags include:

  • --jupyter switches training from CLI to using Jupyter Notebook.

  • --trainValRatio sets the train-validation ratio split. The default is 0.7.

  • --continue-from-version enables you to pick up training from a previous session.

What does the --batchSize flag specify?

The --batchSize flag specifies how many images that the model is trained on per step. This value will be dependent on how much memory you have available. Smaller batch sizes typically take less time per step, but you will need more steps to train your model sufficiently (see numSteps FAQ for details).

What does the --numSteps flag specify?

The --numSteps flag is the number of steps to run for that iteration of training. If you use the --continue-from-version flag, this number will be the total number of steps you want your model to have been trained on. In other words, the total number of steps will be equal to the previous total plus the number you want to train on in the current iteration, so if you previously trained for 100 steps and you want to train an additional 100, you would use

$ ... --numSteps 200 ...

What does the --trainValRation flag specify?

The --trainValRatio flag specifies how much of your annotated data is used for training versus validation, using a floating point number. The default is set to 0.7, which means that 70% of the training data will be used for training and 30% will be used for validation.

What does the --continue-from-version flag specify?

If you use the --continue-from-version command, a new model version will be created, with a model name that is one version higher than the previous iteration that was run. For example, 0.2 will be created if you continue from 0.1, and 0.6 will be created if 0.5 was the last model version, even if you continue from 0.3. Additionally, see the numSteps FAQ for details on how these flags are used together.

Note: When using the --continue-from-version flag with the --jupyter you must also use the --id flag and specify the model ID.

What are the default settings for model training?

You must manually set the training data, the number of steps, the batch size, the model id, and input the labels. The default setting for training is using the CLI (which can be toggled to training using the Jupyter notebook using the --jupyter flag). Additionally, the default for the train/validation split is 0.7, which can be altered using the --trainValRatio flag.

What types of models can I train?

Currently, we train object detection models by transfer-learning from a MobileNet-SSD that has been trained on the COCO Dataset.

How can I test my model?

You can use your model in an app to visually assess the model’s ability to detect the desired objects. You can do this by publishing it to your personal model catalog, using

$ aai model publish <username/modelname>

You can use

$ aai app models add <username/modelname> --local-version <version>

To use the model locally, without publishing to the model catalog.

To add a model to an app, use

$ aai app models add <username/modelname>

And to update the version of the model if the model is already added to your application, simply run

$ aai app models update

Can I test multiple versions of my model side by side?

Yes! However, you must use different training names for the two models, i.e. ‘my_model1’ and ‘my_model2’. You can train and publish two versions of your model, using these different ids, and test each model’s performance on the same input stream. See this blog for more details.

Do I need to train on all of my labels?

Yes, at the moment you must train on all the labels in your dataset. Additionally, if a label is specified in the training command, it must be present in your dataset!

What if I forget to specify a label to train on?

The training tool will print all of the labels it detects in your dataset to the console. You can then re-run the command with all the labels specified.

Do I have to train all at once, or can I pick up where I left off?

You can continue training your model from a previous version by using the --continue-from-version id flag in your training command and specifying the version you would like to continue from. If you do not use this flag, training will begin from step 0 and a new model version will be made, incrementing from the last version.

Do I have to keep using the same training settings if I continue training?

No! You can change any settings, including using the CLI or Jupyter notebook, between iterations of training. The only requirement when continuing training is that your number of steps is equal to the total number you wish to train on.

What if I want to revert to an older version of my model?

You can specify an older version of a model to continue training from, by specifying the version and using the

$ ... --continue-from-version

flag in the aai dataset train command. You can use an older version of a model by specifying the desired model id and using

$ aai app models add <username/modelname> --local-version <version>

Can I see performance metrics while the model is training?

You can graphically view training loss as well as start and end validation loss by training using the ‘–jupyter’ flag, which opens a Jupyter notebook to perform training. If you instead train using the CLI (default), you will see training loss for each step printed to the console as well as mAP and recall every 10 minutes and when training is complete. For more details on interpreting these tables, see our documentation on model training output.

Is there a limit on how much data I can train?

You can train on up to 2GB of data. Training on datasets, either individually or combined, that exceed this size may result in inconsistent results and should not be attempted. Additionally, you may need to check limitations of your file system (using ‘ulimit -aH’ for Linux and Mac) if you have a lot of small files. This is not a restriction for the training tool, however.

What hardware do I need to train a model?

Currently the model training tool is available for beta usage with MacOS and Linux, with support for Windows expected soon thereafter. We offer both CPU and GPU training.

How long does it take to train a model?

This depends largely on the size of your dataset, the number of steps you are running, the batch size, and whether you are training on a CPU or GPU. Generally, training on GPU will be much faster (approximately 3-5 faster) than CPU. For reference, our license plate detection model was trained on a GPU for 74,000 steps with a batch size of 16 using a dataset that contained 951 images and this training took approximately 20 hours. (Note that this model may not have needed to be trained for this many steps, however we offer it as an example in how these different components may affect one another). As another example, running 2000 steps on a CPU using a dataset of 592 images took about 20 minutes using a batch size of four.

Is there a place I can see logs of the training data if I lost my console output?

Yes, in the ‘training-temp’ folder in the training directory there is a folder called ‘logs’ that contains the tensorflow-logs.log file.