TREE-BASED UNIDIRECTIONAL NEURAL NETWORKS FOR LOW-POWER COMPUTER VISION ON EMBEDDED DEVICES
Deep Neural Networks (DNNs) are a class of machine learning algorithms that are widelysuccessful in various computer vision tasks. DNNs filter input images and videos with manyconvolution operations in each layer to extract high-quality features and achieve high ac-curacy. Although highly accurate, the state-of-the-art DNNs usually require server-gradeGPUs, and are too energy, computation and memory-intensive to be deployed on most de-vices. This is a significant problem because billions of mobile and embedded devices that donot contain GPUs are now equipped with high definition cameras. Running DNNs locallyon these devices enables applications such as emergency response and safety monitoring,because data cannot always be offloaded to the Cloud due to latency, privacy, or networkbandwidth constraints.
Prior research has shown that a considerable number of a DNN’s memory accesses andcomputation are redundant when performing computer vision tasks. Eliminating these re-dundancies will enable faster and more efficient DNN inference on low-power embedded de-vices. To reduce these redundancies and thereby reduce the energy consumption of DNNs,this thesis proposes a novel Tree-based Unidirectional Neural Network (TRUNK) architec-ture. Instead of a single large DNN, multiple small DNNs in the form of a tree work togetherto perform computer vision tasks. The TRUNK architecture first finds thesimilaritybe-tween different object categories. Similar object categories are grouped intoclusters. Similarclusters are then grouped into a hierarchy, creating a tree. The small DNNs at every nodeof TRUNK classify between different clusters. During inference, for an input image, oncea DNN selects a cluster, another DNN further classifies among the children of the cluster(sub-clusters). The DNNs associated with other clusters are not used during the inferenceof that image. By doing so, only a small subset of the DNNs are used during inference,thus reducing redundant operations, memory accesses, and energy consumption. Since eachintermediate classification reduces the search space of possible object categories in the image,the small efficient DNNs still achieve high accuracy.
In this thesis, we identify the computer vision applications and scenarios that are wellsuited for the TRUNK architecture. We develop methods to use TRUNK to improve the efficiency of the image classification, object counting, and object re-identification problems.We also present methods to adapt the TRUNK structure for different embedded/edge ap-plication contexts with different system architectures, accuracy requirements, and hardware constraints.
Experiments with TRUNK using several image datasets reveal the effectiveness of theproposed solution to reduce memory requirement by∼50%, inference time by∼65%, energyconsumption by∼65%, and the number of operations by∼45% when compared with existingDNN architectures. These experiments are conducted on consumer-grade embedded systems:NVIDIA Jetson Nano, Raspberry Pi 3, and Raspberry Pi Zero. The TRUNK architecturehas only marginal losses in accuracy when compared with the state-of-the-art DNNs.