Subscribe to Our Newsletter
Sign up for our bi-weekly newsletter to learn about computer vision trends and insights.
How to Annotate Data for Machine Learning

Kathleen Siddell


To leverage the benefits of computer vision (CV) – like improving business operations, ensuring safety and increasing revenue – you need to start by collecting data specific to your environment to ensure a highly accurate CV model. With poor data, your computer vision solution will be ineffective, meaning you won’t get the insights and analytics you need.
Data Is the Bedrock of Computer Vision
Computer vision allows computers to see and interpret the world at the same level of accuracy as humans. CV achieves this capability the same way people do – trial and error. Through a process called machine learning, computer vision models collect and analyze visual data from images and video to complete the desired task. CV tasks include things like object detection to track human activity or count items on a store shelf for inventory tracking, as well as many other techniques and use cases.
To build a reliable model, we have to provide accurate data sets that are properly structured and labeled. That’s the main purpose of data annotation, and it can be a challenging part of the computer vision model development process. Without image annotation, there can be no computer vision and its associated benefits. For example, in traffic monitoring or autonomous car driving - image data that will be collected can contain cars, pedestrians, and cyclists.
Therefore, the models need to gather specific data for all participants (vehicles, humans, etc.) that will be used to create a training dataset.
In this article, we will explain what data annotation is, how it’s done properly, some challenges, and how to make it easy with a computer vision platform.
What Is Data Annotation?
Data annotation in computer vision is the process of labeling images or videos of a dataset to train a machine learning model to perform a computer vision task. Data annotation is a crucial part of supervised learning algorithms in which input data are associated with corresponding outputs. (In supervised learning - labeled datasets are used to train the model to make proper classification and prediction). Image annotations are a subset of data labeling, also known as image labeling, involving people relentlessly tagging images with metadata and attributes that will help machines better identify objects. In practice, that means someone tagging every instance of the desired object - cars, people, dogs, etc. - in every single image.
Data-related tasks take up to 75% of the time in computer vision, and data labeling alone takes around 20-30% of the project time. Data annotation complexity is often underestimated, as improper data labeling can increase the error rates in machine learning. Bad data will produce bad results, and that’s why many CV applications fail. So, the first step is to perform proper data cleaning and auditing. It is an initial step that will weed out the data that is incorrect or irrelevant. Data cleaning will ensure that the training data set is relevant and high-quality.

There are technical hurdles in data annotation that developers should overcome, such as object annotation frame-by-frame, to make them identifiable in the CV model. Poorly done data annotation can cause bad data issues for multiple reasons: inconsistent annotations, lack of defined examples, etc. Why is it valuable for a developer to annotate their own models? It gives them direct access to see what the data looks like and make sure it is accurate. Once you've created an initial model, then outsourcing can help because you have an understanding of the problem and examples to provide to outsourcers. In video annotation - the objects are continuously moving in the visual file, thus each item must be labeled or rounded with cuboids, lines and splines, bounding boxes, or other annotation modes that are needed for a given application.
Types of Data Annotation for Machine Learning
As we mentioned above – there is image annotation and video annotation.
Image annotation is the process of assigning labels to an image. It enables the machine learning algorithm to recognize the annotated area as a separate object or class in a given image.
Video annotation, on the other hand, is the process of labeling parts or clips in the video to classify, detect or identify desired objects frame by frame. Video annotation applies the same methods as image annotation but on a frame-by-frame basis. It is an essential technique for CV tasks such as localization and object tracking.
The four most important methods for image annotation in computer vision are as follows:
- Image Classification - Analyzes images and identifies which objects and other components exist in an image, without localizing them within the image.
- Object Recognition - The ability of the system to analyze an image and recognize individual objects within the image.
- Image Segmentation - Recognizes and understands what's in the image at the pixel level. At least one class is associated with each pixel on the image, unlike object detection, where the bounding boxes of objects can overlap. It is also known as semantic segmentation.
- Boundary Recognition – Finds the semantic boundaries between the distinct objects in an image. The boundary detection process involves edge detection (discontinuity in brightness) and texture segmentation (partitioning an image into regions with different textures containing similar groups of pixels). It provides a balance between annotation speed and targeting items of interest.
Data Annotation Methods and Techniques
To annotate or highlight specific parts of an image depending on the required annotation type, the following types of marking techniques are used:
Bounding Box
Machine learning algorithms for computer vision can effectively be trained by using bounding boxes. It is the simplest image annotation technique involving experts or annotators drawing a box around an object to attribute details related to a particular object. This method is best suited for annotating objects that are symmetrical in shape.

3D cuboids
Another variety of bounding boxes is cuboids. These are 3D variants of bounding boxes that are usually 2D. Cuboids track objects by their size for finer detail. To give you a better idea, 2D blocks contain detailed information about the object's length and width. If we consider a road image, cars can be easily annotated with bounding boxes. However, the cuboid technique also gives you detailed information about the depth of an object.

Polygons
Objects in images are not always symmetrical or regular. There are plenty of cases where you will find that they are irregular or just random. In such cases, annotators use the polygon technique to annotate irregular shapes and objects accurately. This method involves placing points according to the object’s dimensions and manually drawing lines around the circumference or perimeter of the object.

Polylines and Splines
Apart from basic shapes and polygons, simple lines are also used to annotate objects in images. This method allows machines to detect boundaries easily. For example, in autonomous vehicle navigation, lines are drawn across lanes to understand better the boundaries in which they need to maneuver. The polylines are also used to teach the machines and systems about different scenarios and circumstances and help them make better movement decisions.

Landmarking (Key Points)
This technique is used to identify difficulties in the movements of objects in an image or video. It can also be used to detect and annotate small objects. Landmarking is specially used in face recognition to annotate facial features, gestures, expressions, postures, and more. It includes the individual identification of facial features and their attributes to obtain accurate results.

Manual vs. Automated Annotation
In manual image annotation, the process involves manually selecting regions in an image and providing a written description of those parts. Annotators are given a set of raw, unlabeled data, such as photos or videos. They are then given instructions on how to classify it using a set of rules or specialized data annotation methodologies. Two of the simplest and most used annotation techniques are polygons and bounding boxes, which require less time and effort. As opposed, semantic segmentation requires more work and finer annotations.
Automated image annotation is the process by which a computer system automatically assigns labels (metadata) to a digital image. There are several tools for automated image annotation. SuperAnnotate is a comprehensive platform for computer vision engineers and annotation teams to annotate, manage, train, and ultimately automate computer vision pipelines.
If we compare manual vs. automated annotation, both have certain pros and cons:
- Learning process – Manual annotation works better with supervised learning because people can detect new objects and categorize them, while the machine struggles with that.
- Time consumption – Automatic data annotation is much faster and takes less time than the manual. Also, the quality check procedure is done faster with automatic annotation.
- Flexibility - Manual image annotation is considered to be more flexible because it can update the annotation; if annotators find different classes of images - they can react accordingly.
- Scalability – Manual annotation can be scaled-up to meet different customer needs, while automatic annotation is limited because the machine is limited to generating only a given number of things.

Annotating Data for Machine Learning with alwaysAI
Using a computer vision platform like alwaysAI is a great way to leverage all aspects of computer vision, not just for model training. We’ve simplified the end-to-end development and deployment. A key benefit is the use of a library of pre-trained models. Those models or models you upload to our platform can go through further training with our low-code model training toolkit.
With alwaysAI’s Model Training Toolkit, you can train an object detection model to recognize any object you choose. All you need to do is generate data, annotate that data, and train that data using one of the base models we offer. Once your model is trained, you can test it by deploying it locally or on an edge device or uploading it to the alwaysAI model catalog.
Currently, alwaysAI’s training tool trains an object detection model, so the method of annotation is to draw bounding boxes around all examples of the target label(s) in your dataset. There are many tools available to help you do this, from simple, open-source tools like labelIMG to web applications with multiple features like Supervise.ly. As another approach, we also utilize open-source tools that we have incorporated into the alwaysAI CLI – CVAT. In the end, how you draw the bounding boxes is not important - the important thing is the format of the annotations. The format we support is Pascal VOC; further down in this document, you can find an example of this annotation format, in the Validate Data section.
Other operations that you can perform by utilizing alwaysAI’s suite of tools are:
- Upload images, videos, or capture data from a live application and highlight the objects you are looking to detect using alwaysAI annotation tools.
- Dataset augmentation (turning a dataset of 100 images into a dataset of 1800).
- AI-assisted annotation – Using an existing pre-trained model to do your initial annotation.
Get Started With alwaysAI
Data annotation is a widely used process in computer vision. It is used for labeling images or videos to train the machine learning model to perform the required task. Data annotation is a crucial part of supervised learning algorithms in which input data are associated with corresponding outputs. Data annotation can be manual or automated, and for both approaches – there are a bunch of open-source tools that you can utilize with proper guidance and advice. However, compared to other methods - alwaysAI is a much easier and more cost-effective solution.
The alwaysAI platform and model training toolkit simplifies this aspect for developers even further. You don’t have to deal with a bunch of code. You can train your chosen model from our model catalog with data from your actual environment. Also, you can quickly annotate the data and let our cloud training servers do the rest. We’ve eliminated the complexity of computer vision and taken care of many of the common administrative aspects of CV that are difficult and time-consuming. This allows you to focus on creating value from your application.
Schedule time today to speak to an AI expert to learn more about model training and how to leverage our platform to generate huge ROI for your business.
Additional Computer Vision Resources
