Recent industry analyst publications allude to enterprise adoption of Artificial Intelligence (AI) technology practices to skyrocket over the next few years, with worldwide spending on Artificial Intelligence Systems will be nearly $98 billion in 2023, according to IDC.
In this 3 part series on deep learning based object detectors, we will take a closer look at the current ‘state-of-the-art’ in computer vision – deep neural network-based object detectors, their working principles, community support, and popular variants available.
Large chunk of AI initiatives for retail, logistics, automotive, and industrial sectors will be around Computer Vision (CV) applications – accurately detecting, classifying and recognizing diverse objects in streaming video feeds.
In this part, we will look at what are these deep learning-based solutions and how are they different from traditional CV based object detection methods.
Traditional Object Detection Approaches
Machine Vision – a functional counterpart to human vision, was envisioned to have the ability to detect multiple stationary and moving objects in a perceptive field (commonly known as a field of view), categorize them into classes and localize them to regions in the field.
Traditional methods relied on building hand-crafted feature descriptors using low-level visual cues like edges, corners, and colour in image pixels for identifying possible regions for object instances.
Major steps included searching the entire image for possible areas for the presence of the object, detecting object features in probable areas using feature detectors (described above) and finally labelling identified regions with object classes.
DPMs (Deformable part based machines) were most notable among these and were extensively in use till the late 2000s and early 2010s.
Challenges in traditional object detectors
These performed reasonably for very simple contexts. However, in complex scenarios with multiple objects, classes, scales and orientations, there were significant challenges attributed to
- Manually defined feature descriptors and detection windows result in rigid, thus brittle solutions that break arbitrarily in many complex contexts
- Fixed learning capacity for feature representations (i.e., object physical attributes) irrespective of the availability of large data sets
- Low detection accuracy and precision – too many false positives, significant error levels in case of object occlusion, crowding and blurring
- Performance optimizations are limited to solution component level only, and there is no way to optimize the performance across detection pipeline
This led to research aimed at improving object detection using deep learning backbone architectures.
Improvements with deep learning
In the early 2010s, deep convolutional neural networks (DCNNs) emerged as a promising method of image classification at scale. However, there were significant technology and data constraints for achieving significant improvements over traditional approaches for image classification and object detection
- Large annotated training data sets were not available
- Prevalent computation resources and paradigms were constrained
With the general availability of large scale annotated image databases (ImageNet, 2009) and parallel computing systems using GPUs, DCNNs became mainstream with AlexNet (2012) achieving state-of-the-art performance levels.
Applying deep learning on object detection pipelines improves performance in the following ways
- DCNNs learn hierarchical feature representations generatively through training on large data sets using GPU grade resources
- They capture hierarchy of raw pixels to high-level semantic information (i.e., object class labels), rather than only low-level object cues as in case of traditional methods
- DCNNs’ learning capacity enables them to improve feature representation (and subsequently detection accuracy and precision) with incremental training data
- With large training data and right configurations, they can be trained faster to be more accurate and precise, contrary to fixed capacity, rigid traditional methods
- DCNN performance can be optimized end to end, rather than only in parts with the conventional methods
Considering these benefits, there have been significant advancement through research-led innovation in object detector models in terms of detection pipeline composition (two-stage vs. one stage detectors), backbone architectures, learning mechanisms and parameter configurations.
These have been aimed at improving performance (accuracy, precision), responsiveness (detecting at high fps), input complexity (occlusion, multi-scale, multi-aspect ratio, blurring etc.) and resource consumption (inference at the edge with low compute, parameter requirements).
Another success factor for a deep learning-based approach is extensive adoption by the development community. The increasing availability of the above technology innovations, particularly the open-source ones with significant contributions, in the form of model libraries and training data sets, enables enterprises to accelerate the adoption of deep learning technologies to solve complex object detection problems.
Deep learning computer vision applications are revolutionizing numerous industries with some examples we have seen mentioned below
- Retail – Stores can use camera feeds to detect shopper movements, behaviour and cart activity (item on/ off) for improving in-store shopping experience through differentiated placements, personalized promotions and ease of check-out.
- Automotive – ADAS systems are powered to detect cars, pedestrians and on-route obstacles in real-time, improving vehicle responsiveness and reducing accident probabilities.
- Logistics – Real-time video-based monitoring of driver behaviour, handling equipment, and cargo (on-board, warehoused) condition helps in enhancing supply chain throughput and reducing equipment downtime instances, duration etc.
- Industrial manufacturing – Vision based monitoring of an expansive network of large equipment assemblies in a manufacturing facility can assess failure probabilities and pre-empt maintenance or component replacement.
After getting introduced to deep learning based object detectors, in part 2, we will look at the evolution and anatomy of various types of deep learning-based object detection models.