Purdue University Graduate School
Browse

PROCESSING IN MEMORY DESIGN AND OPTIMIZATIONS FOR MACHINE LEARNING INFERENCE

Download (5.34 MB)
thesis
posted on 2024-10-22, 12:17 authored by Mingxuan HeMingxuan He

Advances in machine learning (ML) have ignited hardware innovations for efficient execution of the ML models many of which are memory-bound (e.g., long short-term memories, multi-level perceptrons, and recurrent neural networks). Specifically, inference using these ML models with small batches, as would be the case at the Cloud edge, has little reuse of the large filters and is deeply memory-bound. Simultaneously, processing-in or -near memory (PIM or PNM) is promising unprecedented highbandwidth connection between compute and memory. Fortunately, the memory-bound ML models are a good fit for PIM. We focus on digital PIM which provides higher bandwidth than PNM and does not incur the reliability issues of analog PIM. Previous PIM and PNM approaches advocate full processor cores which do not conform to PIM’s severe area and power constraints. This thesis is composed of three major projects: Newton, activation folding (AcF) and ESPIM. Newton is Sk Hynix’s first accelerator-inmemory (AiMX) product for machine learning, AcF improves the performance of Newton by achieving more compute-row access overlap and ESPIM incorporate sparse neural network models to PIM

History

Degree Type

  • Doctor of Philosophy

Department

  • Electrical and Computer Engineering

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

T.N. Vijaykumar

Advisor/Supervisor/Committee co-chair

Mithuna Thottethodi

Additional Committee Member 2

Tim Rogers

Additional Committee Member 3

Y. Charlie Hu

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC