Optimizing Accuracy-Efficiency Tradeoffs in Emerging Neural Workloads

Nagarajan, Amrit

doi:10.25394/PGS.24760062.v1

Amrit_Final_Thesis.pdf (26.87 MB)

Optimizing Accuracy-Efficiency Tradeoffs in Emerging Neural Workloads

thesis

posted on 2023-12-11, 15:45 authored by Amrit NagarajanAmrit Nagarajan

Deep Neural Networks (DNNs) are constantly evolving, enabling the power of deep learning to be applied to an ever-growing range of applications, such as Natural Language Processing (NLP), recommendation systems, graph processing, etc. However, these emerging neural workloads present large computational demands for both training and inference. In this dissertation, we propose optimizations that take advantage of the unique characteristics of different emerging workloads to simultaneously improve accuracy and computational efficiency.

First, we consider Language Models (LMs) used in NLP. We observe that the design process of LMs (pre-train a foundation model, and subsequently fine-tune it for different downstream tasks) leads to models that are highly over-parameterized for the downstream tasks. We propose AxFormer, a systematic framework that applies accuracy-driven approximations to create accurate and efficient LMs for a given downstream task. AxFormer eliminates task-irrelevant knowledge, and helps the model focus only on the relevant parts of the input.

Second, we find that during fine-tuning of LMs, the presence of variable-length input sequences necessitates the use of padding tokens when batching sequences, leading to ineffectual computations. It is also well known that LMs over-fit to the small task-specific training datasets used during fine-tuning, despite the use of known regularization techniques. Based on these insights, we present TokenDrop + BucketSampler, a framework that synergistically combines a new regularizer that drops a random subset of insignificant words in each sequence in every epoch, and a length-aware batching method to simultaneously reduce padding and address the overfitting issue.

Next, we address the computational challenges of Transformers used for processing inputs of several important modalities, such as text, images, audio and videos. We present Input Compression with Positional Consistency (ICPC), a new data augmentation method that applies varying levels of compression to each training sample in every epoch, thereby simultaneously reducing over-fitting and improving training efficiency. ICPC also enables efficient variable-effort inference, where easy samples can be inferred at high compression levels, and vice-versa.

Finally, we focus on optimizing Graph Neural Networks (GNNs), which are commonly used for learning on non-Euclidean data. Few-shot learning with GNNs is an important challenge, since real-world graphical data is often sparsely labeled. Self-training, wherein the GNN is trained in stages by augmenting the training data with a subset of the unlabeled data and their pseudolabels, has emerged as a promising approach. However, self-training significantly increases the computational demands of training. We propose FASTRAIN-GNN, a framework for efficient and accurate self-training of GNNs with few labeled nodes. FASTRAIN-GNN optimizes the GNN architecture, training data, training parameters, and the graph topology during self-training.

At inference time, we find that ensemble GNNs are significantly more accurate and robust than single-model GNNs, but suffer from high latency and storage requirements. To address this challenge, we propose GNN Ensembles through Error Node Isolation (GEENI). The key concept in GEENI is to identify nodes that are likely to be incorrectly classified (error nodes) and suppress their outgoing messages, leading to simultaneous accuracy and efficiency improvements.

History

Degree Type

Doctor of Philosophy

Department

Electrical and Computer Engineering

Campus location

West Lafayette

Advisor/Supervisor/Committee Chair

Anand Raghunathan

Additional Committee Member 2

Kaushik Roy

Additional Committee Member 3

Vijay Raghunathan

Additional Committee Member 4

Sumeet Kumar Gupta

Usage metrics

Keywords

Deep Learning Accurately Deep learning efficiency Efficient training Efficient inference methods

Licence

CC BY 4.0

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Optimizing Accuracy-Efficiency Tradeoffs in Emerging Neural Workloads

History

Degree Type

Department

Campus location

Advisor/Supervisor/Committee Chair

Additional Committee Member 2

Additional Committee Member 3

Additional Committee Member 4

Usage metrics

Categories

Keywords

Licence

Exports