Purdue University Graduate School
Browse
Dissertation.pdf (5.4 MB)

ACCELERATING SPARSE MACHINE LEARNING INFERENCE

Download (5.4 MB)
thesis
posted on 2024-05-17, 13:01 authored by Ashish GondimallaAshish Gondimalla

Convolutional neural networks (CNNs) have become important workloads due to their
impressive accuracy in tasks like image classification and recognition. Convolution operations
are compute intensive, and this cost profoundly increases with newer and better CNN models.
However, convolutions come with characteristics such as sparsity which can be exploited. In
this dissertation, we propose three different works to capture sparsity for faster performance
and reduced energy. 


The first work is an accelerator design called SparTen for improving two-
sided sparsity (i.e, sparsity in both filters and feature maps) convolutions with fine-grained
sparsity. SparTen identifies efficient inner join as the key primitive for hardware acceleration
of sparse convolution. In addition, SparTen proposes load balancing schemes for higher
compute unit utilization. SparTen performs 4.7x, 1.8x and 3x better than dense architecture,
one-sided architecture and SCNN, the previous state of the art accelerator. The second work
BARISTA scales up SparTen (and SparTen like proposals) to large-scale implementation
with as many compute units as recent dense accelerators (e.g., Googles Tensor processing
unit) to achieve full speedups afforded by sparsity. However at such large scales, buffering,
on-chip bandwidth, and compute utilization are highly intertwined where optimizing for
one factor strains another and may invalidate some optimizations proposed in small-scale
implementations. BARISTA proposes novel techniques to balance the three factors in large-
scale accelerators. BARISTA performs 5.4x, 2.2x, 1.7x and 2.5x better than dense, one-
sided, naively scaled two-sided and an iso-area two-sided architecture, respectively. The last
work, EUREKA builds an efficient tensor core to execute dense, structured and unstructured
sparsity with losing efficiency. EUREKA achieves this by proposing novel techniques to
improve compute utilization by slightly tweaking operand stationarity. EUREKA achieves a
speedup of 5x, 2.5x, along with 3.2x and 1.7x energy reductions over Dense and structured
sparse execution respectively. EUREKA only incurs area and power overheads of 6% and
11.5%, respectively, over Ampere

Funding

1618921-CNS

1405939-CNS

History

Degree Type

  • Doctor of Philosophy

Department

  • Electrical and Computer Engineering

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Dr. T.N.Vijaykumar

Advisor/Supervisor/Committee co-chair

Dr. Mithuna S. Thottethodi

Additional Committee Member 2

Dr. Timothy G. Rogers

Additional Committee Member 3

Dr. Milind Kulkarni

Additional Committee Member 4

Dr. David I. Inouye