Exploring Methods for Efficient Learning in Neural Networks
thesisposted on 2021-07-26, 01:15 authored by Deboleena RoyDeboleena Roy
In the past fifty years, Deep Neural Networks (DNNs) have evolved greatly from a single perceptron to complex multi-layered networks with non-linear activation functions. Today, they form the backbone of Artificial Intelligence, with a diverse application landscape, such as smart assistants, wearables, targeted marketing, autonomous vehicles, etc. The design of DNNs continues to change, as we push its abilities to perform more human-like tasks at an industrial scale.
Multi-task learning and knowledge sharing are essential to human-like learning. Humans progressively acquire knowledge throughout their life, and they do so by remembering, and modifying prior skills for new tasks. In our first work, we investigate the representations learned by Spiking Neural Networks (SNNs), and how to share this knowledge across tasks. Our prior task was MNIST image generation using a spiking autoencoder. We combined the generative half of the autoencoder with a spiking audio-decoder for our new task, i.e audio-to-image conversion of utterances of digits to their corresponding images. We show that objects of different modalities carrying the same meaning can be mapped into a shared latent space comprised of spatio-temporal spike maps, and one can transfer prior skills, in this case, image generation, from one task to another, in a purely Spiking domain. Next, we propose Tree-CNN, an adaptive hierarchical network structure composed of Deep Convolutional Neural Networks(DCNNs) that can grow and learn as new data becomes available. The network organizes the incrementally available data into feature-driven super-classes and improves upon existing hierarchical CNN models by adding the capability of self-growth.
While the above works focused solely on algorithmic design, the underlying hardware determines the efficiency of model implementation. Currently, neural networks are implemented in CMOS based digital hardware such as GPUs and CPUs. However, the saturating scaling trend of CMOS has garnered great interest in Non-Volatile Memory (NVM) technologies such as Spintronics and RRAM. However, most emerging technologies have inherent reliability issues, such as stochasticity and non-linear device characteristics. Inspired by the recent works in spin-based stochastic neurons, we studied the algorithmic impact of designing a neural network using stochastic activations. We trained VGG-like networks on CIFAR-10/100 with 4 different binary activations and analyzed the trade-off between deterministic and stochastic activations.
NVM-based crossbars further promise fast and energy-efficient in-situ matrix-vector multiplications (MVM). However, the analog nature of computing in these NVM crossbars introduces approximations in the MVM operations, resulting in deviations from ideal output values. We first studied the impact of these non-idealities on the performance of vanilla DNNs under adversarial circumstances, and we observed that the non-ideal behavior interferes with the computation of the exact gradient of the model, which is required for adversarial image generation. In a non-adaptive attack, where the attacker is unaware of the analog hardware, analog computing offered varying degree of intrinsic robustness under all attack scenarios - Transfer, Black Box, and White Box attacks. We also demonstrated ``Hardware-in-Loop" adaptive attacks that circumvent this robustness by utilizing the knowledge of the NVM model.
Next, we explored the design of robust DNNs through the amalgamation of adversarial training and the intrinsic robustness offered by NVM crossbar based analog hardware. We studied the noise stability of such networks on unperturbed inputs and observed that internal activations of adversarially trained networks have lower Signal-to-Noise Ratio (SNR), and are sensitive to noise than vanilla networks. As a result, they suffer significantly higher performance degradation due to the non-ideal computations, on an average 2x accuracy drop. On the other hand, for adversarial images, the same networks displayed a 5-10% gain in robust accuracy due to the underlying NVM crossbar when the attack epsilon (the degree of input perturbations) was greater than the epsilon of the adversarial training. Our results indicate that implementing adversarially trained networks on analog hardware requires careful calibration between hardware non-idealities and training epsilon to achieve optimum robustness and performance.
- Doctor of Philosophy
- Electrical and Computer Engineering
- West Lafayette