Accelerating Emerging Neural Workloads

Stevens, Jacob R

doi:10.25394/PGS.17139038.v1

Accelerating_Emerging_Neural_Workloads.pdf (5.33 MB)

Accelerating Emerging Neural Workloads

thesis

posted on 2021-12-20, 14:00 authored by Jacob R StevensJacob R Stevens

Due to a combination of algorithmic advances, wide-spread availability of rich data sets, and tremendous growth in compute availability, Deep Neural Networks (DNNs) have seen considerable success in a wide variety of fields, achieving state-of-the art accuracy in a number of perceptual domains, such as text, video and audio processing. Recently, there have been many efforts to extend this success in the perceptual, Euclidean-based domain to non-perceptual tasks, such as task planning or reasoning, as well as to non-Euclidean domains, such as graphs. While several DNN accelerators have been proposed in the past decade, they largely focus on traditional DNN workloads, such as Multi-layer Perceptions (MLPs), Convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs). These accelerators are ill-suited to the unique computational needs of the emerging neural networks. In this dissertation, we aim to fix this gap by proposing novel hardware architectures that are specifically tailored to emerging neural workloads.

First, we consider memory-augmented neural networks (MANNs), a new class of neural networks that exhibits capabilities such as one-shot learning and task planning that are well beyond those of traditional DNNs. MANNs augment a traditional DNN with an external differentiable memory that is used to store dynamic state. This dissertation proposes a novel accelerator that targets the main bottleneck of MANNs: the soft reads and writes to this external memory, each of which requires access to all the memory locations.

We then focus on Transformer networks, which have become very popular for Natural Language Processing (NLP). A key to the success of these networks is a technique called self-attention, which employs a softmax operation. Softmax is poorly supported in modern, matrix multiply-focused accelerators since it accounts for a very small fraction of traditional DNN workloads. We propose a hardware/software co-design approach to realize softmax efficiently by utilize a suite of approximate computing techniques.

Next, we address graph neural networks (GNNs). GNNs are achieving state-of-the-art results in a variety of fields such as physics modeling, chemical synthesis, and electronic design automation. These GNNs are a hybrid between graph processing workloads and DNN workloads; they utilize DNN-based feature extractors to form hidden representations for each node in a graph and then combine these representations through some form of a graph traversal. As a result, existing hardware specialized for either graph processing workloads or DNN workloads is insufficient. Instead, we design a novel architecture that balances the needs of these two heterogeneous compute patterns. We also propose a novel feature dimension-blocking dataflow to further increase performance by mitigating the memory bottleneck.

Finally, we address the growing difficulty in tightly coupling new DNNs and a hardware platform. Given the extremely large DNN-HW design space consisting of DNN selection, hardware operating condition, and DNN-to-HW mapping, it is infeasible to exhaustively search this space by running each sample on a physical hardware device. This has led to the need for highly accurate, machine learning-based performance models which can \emph{predict} the latency/power/energy even faster than direct execution. We first present a taxonomy to characterize the possible approaches to these performance estimators. Based on the insights from this taxonomy, we present a new performance estimator that combines coarse-grained and fine-grained to achieve superior accuracy with a limited number of training samples. Finally, we propose a flexible framework for creating these DNN-HW performance estimators.

In summary, this dissertation identifies the growing gap between current hardware and new emerging neural networks. We first propose three novel hardware architectures that address this gap for MANNs, Transformers, and GNNs. We then propose a novel hardware-aware DNN estimator and framework to ease addressing this gap for new networks in the future.

History

Degree Type

Doctor of Philosophy

Department

Electrical and Computer Engineering

Campus location

West Lafayette

Advisor/Supervisor/Committee Chair

Anand Raghunathan

Additional Committee Member 2

Vijay Raghunathan

Additional Committee Member 3

Timothy Rogers

Additional Committee Member 4

Kaushik Roy

Usage metrics

Keywords

Computer architecture Deep Learning hardware software codesign Computer Engineering

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Accelerating Emerging Neural Workloads

History

Degree Type

Department

Campus location

Advisor/Supervisor/Committee Chair

Additional Committee Member 2

Additional Committee Member 3

Additional Committee Member 4

Usage metrics

Categories

Keywords

Licence

Exports