Purdue University Graduate School
Caleb Tung - Dissertation - Efficient and Consistent Convolutional Neural Networks for Computer Vision.pdf (13.4 MB)

Efficient and Consistent Convolutional Neural Networks for Computer Vision

Download (13.4 MB)
posted on 2023-07-27, 00:53 authored by Caleb TungCaleb Tung

Convolutional Neural Networks (CNNs) are machine learning models that are commonly used for computer vision tasks like image classification and object detection. State-of-the-art CNNs achieve high accuracy by using many convolutional filters to extract features from the input images for correct predictions. This high accuracy is achieved at the cost of high computational intensity. Large, accurate CNNs typically require powerful Graphics Processing Units (GPUs) to train and deploy, while attempts at creating smaller, less computationally-intense CNNs lose accuracy. In fact, maintaining consistent accuracy is a challenge for even the state-of-the-art CNNs. This presents a problem: the vast energy expenditure demanded by CNN training raises concerns about environmental impact and sustainability, while the computational intensity of CNN inference makes it challenging for low-power devices (e.g. embedded, mobile, Internet-of-Things) to deploy the CNNs on their limited hardware. Further, when reliable network is limited or when extremely low latency is required, the cloud cannot be used to offload computing from the low-power device, forcing a need to research methods to deploy CNNs on the device itself: to improve energy efficiency and mitigate consistency and accuracy losses of CNNs.

This dissertation investigates causes of CNN accuracy inconsistency and energy consumption. We further propose methods to improve both, enabling CNN deployment on low-power devices. Our methods do not require training to avoid the high energy costs associated with training.

To address accuracy inconsistency, we first design a new metric to properly capture such behavior. We conduct a study of modern object detectors to find that they all exhibit inconsistent behavior. That is, when two images are similar, an object detector can sometimes produce completely different predictions. Malicious actors exploit this to cause CNNs to mispredict, while  image distortions caused by camera equipment and natural phenomena can also cause mispredictions. Regardless the cause of the misprediction, we find that modern accuracy metrics do not capture this behavior, and we create a new consistency metric to measure the behavior. Finally, we demonstrate the use of image processing techniques to improve CNN consistency on modern object detection datasets.

To improve CNN energy efficiency and reduce inference latency, we design the focused convolution operation. We observe that in a given image, many pixels are often irrelevant to the computer vision task -- if the pixels are deleted, the CNN can still give the correct prediction. We design a method to use a depth mapping neural network to identify which pixels are irrelevant in modern computer vision datasets. Next, we design the focused convolution to automatically ignore any pixels marked irrelevant outside the Area of Interest (AoI). By replacing the standard convolutional operations in CNNs with our focused convolutions, we find that ignoring those irrelevant pixels can save up to 45% energy and inference latency. 

Finally, we improve the focused convolutions, allowing for (1) energy-efficient, automated AoI generation within the CNN itself and (2) improved memory alignment and better utilization of parallel processing hardware. The original focused convolution required AoI generation in advance, using a computationally-intense depth mapping method. Our AoI generation technique automatically filters the features from the early layers of a CNN using a threshold. The threshold is determined using an Accuracy vs Latency curve search method. The remaining layers will apply focused convolutions to the AoI to reduce energy use. This will allow focused convolutions to be deployed within any pretrained CNN for various observed use cases. No training is required.


CDSE: Collaborative: Cyber Infrastructure to Enable Computer Vision Applications at the Edge Using Automated Contextual Analysis

Directorate for Computer & Information Science & Engineering

Find out more...

Collaborative Research: OAC Core: Advancing Low-Power Computer Vision at the Edge

Directorate for Computer & Information Science & Engineering

Find out more...

SI2-SSE: Analyze Visual Data from Worldwide Network Cameras

Directorate for Computer & Information Science & Engineering

Find out more...

Summit of Software Infrastructure for Managing and Processing Big Multimedia Data at the Internet Scale

Directorate for Computer & Information Science & Engineering

Find out more...


Degree Type

  • Doctor of Philosophy


  • Electrical and Computer Engineering

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Dr. Yung-Hsiang Lu

Additional Committee Member 2

Dr. Mahsa Ghasemi

Additional Committee Member 3

Dr. Qiang Qiu

Additional Committee Member 4

Dr. George K. Thiruvathukal

Usage metrics




    Ref. manager