Running Training

With alwaysAI, you can train a model using transfer-learning from either a MobileNet-SSD, YOLOv3, or ResNet Faster RCNN model that has been pre-trained on the COCO Dataset. We offer three different user interfaces for training, each with different benefits. Select the interface that best fits your workflow, choose the model that suits your project, and start training.

Train with the Desktop Application

Opening the Portal

You can now run training using a no-code desktop application using Mac and Windows (for now, this feature is not available on Linux). Training with the desktop is the simplest way to get up and running with model training. To use this feature, first open the alwaysAI application; you should see a screen similar to the image below.

image

From here, you can click the Model Training button to open the model training portal. When the model training portal opens, you should see a screen such as the following:

image

Select a Dataset

Step one is to select a dataset. The first time you train on a particular a dataset, you need to add it to your locally stored datasets by clicking the Add Dataset button and selecting the dataset from your computer. The dataset needs to be a .zip file. The contents of the .zip file should be two folders: one named JPEGImages containing all your images (the format doesn’t have to be .jpeg, they can be .png, .jpg etc.), and another named Annotations that contains your corresponding annotations in Pascal VOC format. You can learn more about creating a dataset on the Data Collection page and the Data Annotation page.

If you want to train using a dataset that you have previously trained with, select it from the Select Existing dropdown. You can remove selected dataset by pressing the X. Selecting a different dataset from the dropdown or adding a new dataset will replace your previous dataset selection.

Labels will be automatically inferred by the training tool.

Configure Training

Once you have selected or uploaded your dataset you need to enter a name for your model in the specified text box. One thing to note is that spaces are not allowed so use hyphens or underscores in place of whitespace.

The next thing you will do is select which base model you want to use for this training session. From the dropdown select either mobilenet_v1, resnet_faster_rcnn, or yolov3. Which model is best for you depends on your usecase, but for edge-based real-time inferencing try MobileNet or YOLO, and use Faster RCNN when accuracy is more important than speed.

Select the number of epochs and the desired batch size for your training session. You can read about epochs here, and more about batch sizes here.

Now set the Train/Validation ratio for this session. This number should be between 0 and 1, and is the percentage of images allocated for training, with the remaining percent saved for validation. Default is 0.7, or 70% of your images used for training. See the FAQ for more details.

Model size is where you can configure the dimensions of the images used in this training session. Each base model has pre-determined values for small, medium, and large models. There is also a custom option where you can enter any dimensions you want, however if you choose this option for a YOLO model you must enter the same number for width and height and it must be divisible by 32. We recommend using one of the pre-set dimensions.

You can continue training from a previous version by clicking the continue training from previous version checkbox and choosing a version from the drop down menu that is subsequently displayed.

Run Training

Press the Start Training button to begin training with the specified configurations. While training is running, you can view both a plot output as well as log output. An example of log output is shown below; skip down to the training output section to view an example of the plot output.

iamge

Stopping Training

You can stop training at any time by pressing the Stop Training button. Otherwise, you will see output similar to below once training finishes.

Training Output

Training loss is plotted after each step; validation loss, shown in red, will be output after every epoch. You can read more about training and validation loss here. An example of the training output is shown below.

image

Once training finishes, you will also see output below the plot that provides you will a summary of the newly trained model’s details including the name, labels, final loss, and the version number, as shown in the next image.

image

You will also be provided with text that you can copy and past into your terminal and app.py files in order to use your model in an application.

image

If you would like to train your model further, you can enter in the same model ID, update the number of epochs and batch size if you choose, and click Start Training once more. Steps will start from one, but loss should be lower than in the previous training, as you can see from the example below. When continuing from a previous training session, you must use the same type of model (MobileNet, YOLO, Faster RCNN) that you used before. In addition, you won’t be able to change the Train/Validation ratio between sessions.

image

Training with the CLI

Training with the CLI enables you to adjust some configurations, such as the model to retrain on, the model dimensions, and the train/validation split. The core command to run model training and produce an object detection model for a given dataset is:

aai dataset train [</path/to/dataset1.zip> </path/to/dataset2.zip> ...] [<options>]

You can then specify additional flags to customize the training further.

Note: The path and the filetype (.zip) are only required the first time you train using a dataset. On subsequent training sessions using a previously trained dataset you just need the name. e.g. First training session would look like ~/datasets/my_dataset.zip, and the next time you would just need my_dataset.

Number of Epochs Flag

$ ... --numEpochs <num>

The --numEpochs <num> flag is required and specifies the number of times training will iterate through every image in your dataset.

Model Name Flag

$ ... --name <str>

The --name <str> flag specifies the name that you want to call your model (ex: dog_detector). This flag is optional, and if it is not provided, the training tool will generate a random phrase to use for your model name.

Label Flag

$ ... --labels <str0> [...]

The --labels <str0> [...] flag specifies the labels in the dataset that you are training on. Note that this flag is optional, as labels will be automatically detected by the training tool.

Base Training Model Flag

$ ... --model <str>

The --model <str> is another optional flag, which specifies the base training model. The three options are mobilenet_v1, yolov3, and resnet_faster_rcnn.

Training Hardware Flag

$ ... --hardware <str>

The --hardware <str> is an optional flag, which can take two possible values: CPU or GPU. Note that GPU is only available on Linux. The default is set to CPU.

Use Jupyter Labs Flag

$ ... --jupyter

The --jupyter flag alerts the tool to run model training in a Jupyter notebook.

Batch Size Flag

$ ... --batchSize <num>

The --batchSize <num> is a required flag, which is used to set the batch size used for training. A batch is the number of images processed before updating the model weights.

Accept all Defaults Flag

$ ... --yes

The --yes is an optional flag and tells the tool to skip interactive prompts and accept default response in all cases.

Continue from Previous Training Session Flag

$ ... --continue-from-version <str>

The --continue-from-version <str> is an optional flag that specifies that training should be started using the weights of a previous training session. Specify the cached version of model (ex: 0.1), and continue from where you left off.

Training/Validation Ratio Flag

$ ... --trainValRatio <num>

The --trainValRatio <num> is an optional flag that specifies the training/validation ratio, which is the proportion of the dataset that is used for training and validation, respectively. The default is set to 0.70.

Image Size Flag

$ ... --imgSize <str>

The --imgSize <str> is an optional flag that is used to specify image dimensions used to train your model. The accepted arguments are: small (300x300), medium (640x640), large (1280x720), or an integer by integer representation (Ex. 300x300).

Model Training Output

This guide outlines the different components that are returned after running the model training command.

Summary

Once you run the model training command you will get a summary of the current training session. Here is a sample:

info: Model id: testuser/confident_proskuriakova
info: Train/Validation ratio : 0.7

info: Model dimensions : 300 by 300
info: Datasets
  dataset_sample_584.zip
info: --labels flag not set, labels will be generated based on input dataset.
info: Available labels for dataset:
  vehicle
  license_plate
info: Batch Size : 1

Model Name

The model name is automatically generated if not provided, using the current user’s user id for the id portion and a random phrase for the model name.

Train/Validation Ratio

The proportion of the dataset to use for training and validation, respectively; this is set to 0.7 by default.

Image Size

The image size to be used (300x300 as default).

Datasets

A list of the datasets that are included in the training session.

Labels

The list of labels that the model will be trained to recognize. These will be automatically detected if they are not provided in the training command. If you provide any labels, you must provide them all at the moment.

Available Labels

The available labels list is generated by analyzing the label fields of the annotations. Use this field to validate the labels that you have chosen for this training session against the labels available in the dataset.

Batch Size

This displays the batch size chosen for this training session. If you are training using your cpu, try starting with a batch size of 4 or 8 in order to get a feel for performance. If you are training using a GPU, you can test out batch size of 16 and 32. Explore what works best with your hardware.

Training Session

Steps

After each step the result of that step is printed to the console. The first statistic is the loss. The second is the time taken for that step.

image

Validation

Periodically the progress of the model is tested by running it against the validation dataset. The result of this analysis is a table containing the mean average precision, the mean average recall, and variety of specific precision statistics.

image

Training with Jupyter Notebook

Training using the alwaysAI Jupyter Notebook adds a graphical interface to training a model, and also gives you additional control over your training dataset. You can track how the model is improving during training by watching the plot of loss and validation loss per step. This guide details how to run training using Jupyter and contains the following sections:

  1. Jupyter Training Command

  2. Launch The Jupyter Notebook

  3. Select Dataset

  4. Train Your Model

  5. Training Complete

  6. Continue Training with Jupyter

Jupyter Training Command

We are going to configure model training using the Jupyter Notebook interface, so you don’t need many flags in your CLI command. To train a model using jupyter simply add the jupyter flag to the training command:

$ ... aai dataset train --jupyter

The only time you should add other flags to the Jupyter Notebook training command is if you are continuing training from a previous session, which you can read about here.

Launch The Jupyter Notebook

Running training using the jupyter flag launches a Notebook on your localhost and attempts to connect using your default browser. While the server is starting you may get feedback from your browser that the site can’t be reached, but once the server is up and running you will be able to see the notebook.

Note: If you run into issues or close the window, the URL for training is http://localhost:8888/lab.

Once you are connected and can see the Jupyter Labs interface, double-click the file named Notebook.ipynb. This brings up the alwaysAI model training notebook.

image

From the toolbar, click the start symbol twice, as shown in the image below. This will bring up the configuration interface.

image

Select Dataset

(Optional) S3 Setup

If you would like the option to use datasets available from S3, you can follow these steps.

  1. Install AWS CLI, following the instructions found here, selecting your specific OS.

  2. Configure AWS, specifically the access and secret key configuration as well as the region configuration.

Upload a dataset into your training environment by clicking the Upload New button. Use the components that appear to choose your dataset from your file system, give it a name, and define the train/validation ratio that the dataset will be split into (note that this enables you to re-use the same split in another training session). Resizing the images is optional. If you have linked an S3 bucket you can choose to store this dataset there, otherwise it will be created locally. Click the Create Dataset button when you are finished and alwaysAI will take these configurations and transform your dataset into a format that can be used for model training.

If you have already gone through this process, you can select a dataset from the dropdown menu. You can edit or delete a dataset using the Edit and Delete buttons respectively.

image

Train Your Model

The Training section is where you configure the current training section. First, select a model from the dropdown. We offer three options: Mobilenet SSD version 1, YOLO version 3, and FasterRCNN. You can also change the input size, which is the dimensions of the images you will use to train, which will also affect training time and accuracy. You can test and iterate to find what is the best combination for your purposes. See the FAQ for more details.

Give your model a name; your aai username should automatically be entered along with a default model name if you don’t specify a new one. Keep the username as is, and update the modelname if you’d like.

Then set the number of epochs and the batch size. The current configuration options allow you to set when validation of training occurs as well as how often training loss is plotted. Use the text input box to enter the number of steps to train before validating the model’s learning. The default value is 1, meaning that the progress of the model will be validated after every epoch. Keep in mind that this validation takes time and resources, so we recommend that you don’t validate every step. Early stopping rounds is a way of automatically stopping training after the model is sufficiently trained.

image

Press the Begin Training button to start training the model. This brings up the graph that plots the loss for every step. The scale of the plot corresponds to the total number of steps that was defined in the training command.

Training Complete

Once training is done, you will see a green check mark below the plot. This tells you that the steps have been run, and the model is complete. To finish the process, click ‘Shut Down Notebook’ and close the browser to officially end training. The model will be exported just as if trained through the CLI. See our documentation on stopping training and post model training for further guidance.

image

Your model will be available to continue training from, use locally or publish, the same as when training using the CLI.

Continue Training with Jupyter

If you would like to train your model further, you can use the --continue-from-version flag. When you add this flag, you also need to add the --name flag, indicating which model you are continuing from, like so

$ aai dataset train path/to/dataset_zipped.zip --name <modelname> --continue-from-version <version> --jupyter

You can specify the training dataset and parameters as you did before. You will see the plotting continue from the previous training, with output similar to the image below, which shows an example of running the training for an additional five epochs.

image