File(s) under embargo
until file(s) become available
HARDWARE-AWARE EFFICIENT AND ROBUST DEEP LEARNING
Deep Neural Networks (DNNs) have greatly advanced several domains of machine learning including image, speech and natural language processing, leading to their usage in several real-world products and services. This success has been enabled by improvements in hardware platforms such as Graphics Processing Units (GPUs) and specialized accelerators. However, recent trends in state-of-the-art DNNs point to enormous increases in compute requirements during training and inference that far surpass the rate of advancements in deep learning hardware. For example, image-recognition DNNs require tens to hundreds of millions of parameters for reaching competitive accuracies on complex datasets, resulting in billions of operations performed when processing a single input. Furthermore, this growth in model complexity is supplemented by an increase in the training dataset size to achieve improved classification performance, with complex datasets often containing millions of training samples or more. Another challenge hindering the adoption of DNNs is their susceptibility to adversarial attacks. Recent research has demonstrated that DNNs are vulnerable to imperceptible, carefully-crafted input perturbations that can lead to severe consequences in safety-critical applications such as autonomous navigation and healthcare.
This thesis proposes techniques to improve the execution efficiency of DNNs during both inference and training. In the context of DNN training, we first consider the widely-used stochastic gradient descent (SGD) algorithm. We propose a method to use localized learning, which is computationally cheaper and incurs lower memory footprint, to accelerate a SGD-based training framework with minimal impact on accuracy. This is achieved by employing localized learning in a spatio-temporally selective manner, i.e., in selected network layers and epochs. Next, we address training dataset complexity by leveraging input mixing operators that combine multiple training inputs into a single composite input. To ensure that training on the mixed inputs is effective, we propose techniques to reduce the interference between the constituent samples in a mixed input. Furthermore, we also design metrics to identify training inputs that are amenable to mixing, and apply mixing only to these inputs. Moving on to inference, we explore DNN ensembles, where the output of multiple DNN models are combined to form the prediction for a particular input. While ensembles achieve improved classification performance compared to single (i.e., non-ensemble) models, their compute and storage costs scale with the number of models in the ensemble. To that end, we propose a novel ensemble strategy wherein the ensemble members share the same weights for the convolutional and fully-connected layers, but differ in the additive biases applied after every layer. This allows for ensemble inference to be treated like batch inference, with the associated computational efficiency benefits. We also propose techniques to train these ensembles with limited overheads. Finally, we consider spiking neural networks (SNNs), a class of biologically-inspired neural networks that represent and process information as discrete spikes. Motivated by the observation that the dominant fraction of energy consumption in SNN hardware is within the memory and interconnect network, we propose a novel spike-bundling strategy that reduces energy consumption by communicating temporally proximal spikes as a single event.
As a second direction, the thesis identifies a new challenge in the field of adversarial machine learning. In contrast to prior attacks which degrade accuracy, we propose attacks that degrade the execution efficiency (energy and time) of a DNN on a given hardware platform. As one specific embodiment of such attacks, we propose sparsity attacks, which perturb the inputs to a DNN so as to result in reduced sparsity within the network, causing it’s latency and energy to increase on sparsity-optimized platforms. We also extend these attacks to SNNs, which are known rely on sparsity of spikes for efficiency, and demonstrate that it is possible to greatly degrade latency and energy of these networks through adversarial input perturbations.
In summary, this dissertation demonstrates approaches for efficient deep learning for inference and training, while also opening up new classes of attacks that must be addressed.