Object tracking and re-identification are among the most important tasks in computer vision. In object tracking, the algorithm follows the movement of an object and tries to estimate or predict its position in a video. Re-identification is crucial in tracking moving objects because it enables us to identify the same object throughout a video sequence. Re-identification makes it possible to find objects, even if they are missing in several consecutive frames.
There are multiple use cases of object tracking and re-identification – from a person’s re-ID (security) and vehicle re-ID (traffic control) - to advanced medical imaging, robotics, autonomous driving, and sports analytics.
The challenges in object tracking are multifold, from model training and background clutter to video performance, tracking speed, and occlusions. alwaysAI successfully implements both of these techniques, with the ability to run on edge devices, making computer vision more affordable for customers across industries.
What Is Object Tracking?
Object tracking enables us to track a unique object across a series of frames. It involves the following processes:
- Target initialization – drawing a bounding box around the object of interest in the first frame.
- Appearance modeling – modeling the visual appearance of the object (different lighting conditions, speed, angle, etc.):
- Visual representation: trying to choose robust features that represent the object.
- Statistical modeling: applying statistical learning techniques (models) for object identification.
- Motion estimation – using the model's predictive capability to predict the object’s future position correctly.
- Target positioning – once the location of the object is approximated, use a visual model to determine the exact location of the target.
Particular object tracking is based on object detection, and it is done by the following steps:
- Detect the object.
- Assign a unique ID to it.
- Track the object through the frames:
- Last known object’s position (history of the path).
- Path estimation (where do we think this object is now, based on new information).
- Matching phase (creating some metric on how confident we are that that’s our object).
- Tracking data is then stored to predict where the object is going, despite occlusions or other digital distractions.
An example of object tracking is the centroid tracker. In this method, we store the last known bounding boxes, then create a new set of bounding boxes and minimize the maximum distance between objects that match. There isn't an estimation part in the centroid tracker, it’s purely using the Euclidean distance (i.e., how far the boxes are from each other). It’s the easiest method to conceptually understand because we do the matching ourselves: using four bounding boxes and then four new ones, drawing lines to the ones that are closest to each other.
There is another estimation method, the Kalman filter, which uses some history of what has been tracked, and estimates from the last known position where we think the object might be going. In this case, a column and filter do the additional estimation. It does the same distance metric as the matching of bounding boxes, but in this case, instead of the last known position, it's an estimated position from the Kalman filter.
How is Object Tracking Different from Object Detection?
Object detection involves identifying an object in an image or a single video frame. It works only if the object is present in the image. On the other hand, object tracking follows a particular moving object in real-time, even if the object is missing in one or several frames. Using object detection for tracking can be computationally expensive since it applies classification techniques and it can track only known objects.
In most cases, object detection is the input to the object tracker. In classical object tracking techniques, a bounding box is drawn around the object, and the tracker is given the task “to follow the object.”
The output of the object detection is typically a bounding box and a label. Aside from this label, you can add classification to provide more robust data, depending on what you need to know for your tracking application. For example, if you detected a person, you could run a classifier on that person for demographics and attach it as an additional attribute to that object. The tracker will be flexible (by holding the required information at the output), allowing more use cases and applications.
Object Tracking Models/Algorithms
The most common deep-learning algorithms used in object tracking are Convolution Neural Networks. Multi-layer CNN-s have larger discriminative power, where the first layers act as feature extractors (edges, gradients, corners), and the last layers can be trained in real-time.
Other types of deep-learning methods: Recurrent Neural Networks (RNNs), Autoencoders (AEs), Generative Adversarial Networks (GANs), Siamese Neural Networks (SNNs), and custom neural networks – have also been applied in object tracking. They apply selective segmentation to find out in which part of the image - the object is most probably located. The algorithm evaluates which regions of the frame share similar colors, textures, and lighting.
Types of Object Tracking
There are several types of object tracking:
Image tracking – performs automatic recognition and tracking of images. It is applied in augmented reality (AR), such as retail apps that create a virtual experience that places items in personal surroundings.
Video tracking – tracks a moving object throughout a video. It analyzes the video frames sequentially and fastens the previous object location with the current location by predicting and placing bounding boxes around it.
Real-time vs. pre-recorded – live video is streaming everywhere around us; thus, real-time tracking and localization of objects are crucial (e.g., traffic control and security applications).
Single object tracking (SOT) – tracks one single object, regardless of the other objects in the frame. It creates a bounding box around the object in the first frame, and the tracker is tasked to follow the object in the next frame.
Multiple object tracking (MOT) – follows the trajectory of several objects in a video sequence. MOT assigns a unique ID to each bounding box (object detected) to distinguish among intra-class objects. MOT is widely applied in video surveillance, autonomous cars, etc.
Challenges in Object Tracking
Real-time video performance - Different tracking algorithms have different performance levels. In general, the centroid tracker and the common tracker (Kalman) are rather lightweight and fast, without video-performance issues. Some trackers have computational limits with multiple objects. For example, the correlation tracker becomes unusable or too slow once you start tracking two or more objects. You have to consider how much the algorithm will impact your system's speed.
Tracking speed – There are a lot of features you should tune for the tracker. For example, tuning a tracker parameter like max distance is useful in determining the radius where you accept and reject a match. Also, your video frames per second (FPS) make a big impact in tracking objects. If your frame rate is 5 FPS, and a car is driving on a highway, you will realize that the car is moving a lot between frames. If you track that same car at 30 or 60 FPS, it will move less between frames. You should choose parameters that provide you with the best performance.
Training – Object tracking is a complex task. It performs object detection, classification, and localization, so its implementation is mathematically complex. The model should be carefully trained to provide satisfactory accuracy. The Fast R-CNN is an algorithm that can handle object tracking efficiently.
Occlusions – there can often be interference when the object is mixed-up with other objects or the background, so the algorithm loses track of it. Occlusion causes an initially tracked object to be re-identified as a new object. It can be prevented by sensitivity - enabling the user to identify which object feature is distracting the algorithm. When identified, similar images can be used to correct the biases and help the algorithm to extract features that differentiate the objects.
Background clutter, noise - densely populated backgrounds usually cause noise or introduce redundant information. Backgrounds that are crowded, have the same color as the object, or are too cluttered can make it harder to track particular or lightly-colored objects. When the background is single-colored or blurry, it is easier for the CV algorithm to detect and track objects.
3D spatial issues, multiple scales – object trackers operate in complex, real-world environments. If you have a camera facing a sidewalk, and you haven't trained your model on the people far away, there's less data in the small boxes for the model to detect. If you're trying to match up two people that are far away and moving, or there's a person really close to them, the box will move more. A fixed radius cannot be applied in such cases.
alwaysAI adds more flexibility in terms of providing more dynamic distance metrics. Let’s consider the size of the bounding box in tracking a faraway person. The smaller radius won’t make sense - if the box is only 5 pixels wide, it can’t match with something 20 pixels away. The same is true with the Kalman filter, where the object’s velocity is crucial, especially if people are walking toward you. Training and feature extraction are critical steps to resolve these issues. Anchor boxes can be used to mitigate such problems.
What is Re-identification?
Re-identification is crucial in tracking moving objects - it enables the identification of the same object throughout the entire video sequence. Re-identification makes it possible to find objects even if they are missing in several consecutive frames or enter new zones, e.g. when a car is identified in a frame of one street camera, it may be missing in the subsequent frames but should be re-identified when it returns in another camera field.
- As humans, we observe and realize: “This is the same car as before.”
- Object detection model says: “Hey, there's a car.”
- The current object tracking algorithms says: “The car that was here before, we think it moved and is now in this zone.”
Re-identification requires more recognition since it involves creating a video gallery and trying to find the particular item from before. The object tracking algorithms are not quite capable of performing re-ID, as they will get blurry. There are more advanced algorithms for re-ID, such as the Siamese neural network that performs image embedding. It assigns a vector to the image so that similar images will have "close" embedding vectors according to a defined distance (Euclidean distance, cosine similarity, etc.).
Re-identification is very important in particular application scenarios such as a person’s re-ID in video surveillance (for security/anti-crime purposes) and vehicle re-ID (in traffic control). Based on the use case, there are some specific algorithms, like facial recognition, which is a very well-developed area. It finds all different points and measurements from a face and then matches them against a set of known data. Some researchers use a Siamese network and train the network model by contrast loss. In order to perform person re-ID, the distance between positive sample pairs is gradually reduced, and the distance between negative sample pairs is gradually increased.
Important Considerations in Building Tracking and Re-identification Applications
Specifics about Training and Annotation
The base object tracking algorithms are very useful for simpler use cases. You can set up and run them rather quickly. When you encounter complex use cases, you can create your own components of the tracking process. You can utilize alwaysAI’s tracking framework and plug your pieces into it for greater flexibility and extension. You can write your own code for the estimation component and make it easy and flexible. You can set up your distance metric for the matching component, etc. The alwaysAI platform creates a lot of ease to jump into the development process with minimal effort and then more easily customize and complete other required steps for advanced uses.
Edge/Cloud Deployment considerations
There are many unique challenges with developing and deploying object tracking on edge devices. One of the challenges that we've identified is that developers often prototype and build on a regular laptop (PC) that performs very differently from edge devices. Some things might be faster, some things might be slower, and the cameras might be different. When you transfer those parameters to the production system, those parameters won't make sense. You have to be careful about the platform in your process. If you're running your tests with 30 FPS - when you deploy to 5 FPS, they won’t make sense anymore. Set up your test environment to match the deployment environment and make sure you're picking proper and flexible parameters to avoid real-time problems. If your camera says 5 FPS, but it's actually 2 FPS, you should set up your app properly.
For cloud deployments, you should consider the following factors:
- Losing the network connection in critical applications.
- The reaction speed of the app (e.g., face verification decision - OK/ not OK).
- Privacy and data security – some businesses don’t want to store their computer vision assets on a cloud server.
Object Tracking and Re-identification Use Cases
There are many useful applications of object tracking in various industries. With successful implementation, object tracking provides valuable real-time data that allows business operators to make decisions faster.
Video surveillance systems can become instantly more sophisticated with the use of object tracking. Security systems can do more than simply detect the presence of people and alert an operator. By incorporating computer vision and object tracking into existing camera systems, multiple video streams can be monitored to follow the movements of people in secure or high-risk zones and alert personnel only if they enter restricted areas. This type of functionality helps automate processes and allows operators to be more efficient and handle a higher volume of work.
A deep understanding of shopper behavior is the pinnacle of customer analytics. With object tracking, computer vision systems provide much greater insight into how customers act once they enter a store. Shoppers can be tracked through multiple zones to understand which areas are most popular, what products they look at, and how to optimize store layout.
Manufacturers and warehouses are looking for ways to boost productivity and find significant cost savings. Computer vision can track workers, while maintaining their privacy, to ensure they are following the correct processes and procedures. Tracking data can also help improve facility layout to minimize unnecessary travel and optimize the placement of products and supplies. Computer vision systems play a huge role in autonomous robots and machinery that improve productivity and free humans to do more critical and creative tasks.
Computer vision systems have a great impact on ensuring safe workplaces. Industrial sites and construction companies use object tracking systems to monitor workers for adherence to safety protocols and the use of protective equipment. Hazardous zones can also be established to alert key personnel when someone enters a dangerous or restricted area.
Object Tracking with alwaysAI
Object tracking and re-identification are quite complex tasks, requiring advanced deep learning models to be implemented. But they are essential in many business applications. alwaysAI, with its proven computer vision expertise, can help you build and deploy object tracking more easily and much more cost-effectively than traditional methods. We provide developers and enterprises with a comprehensive platform for building, deploying and managing computer vision applications on edge devices.
The alwaysAI platform offers a catalog of pre-trained models, a low-code model training toolkit, and a powerful set of APIs to help developers at all levels build and customize computer vision applications in many industries and have an immediate impact on driving huge ROI. alwaysAI has an easy deployment process and a state-of-the-art run-time engine to accelerate computer vision applications into production quickly, securely, and affordably.
Book a 15-minute meeting with one of our AI Experts and learn how to leverage computer vision now in your business.