PROCESSING IN MEMORY DESIGN AND OPTIMIZATIONS FOR MACHINE LEARNING INFERENCE
Advances in machine learning (ML) have ignited hardware innovations for efficient execution of the ML models many of which are memory-bound (e.g., long short-term memories, multi-level perceptrons, and recurrent neural networks). Specifically, inference using these ML models with small batches, as would be the case at the Cloud edge, has little reuse of the large filters and is deeply memory-bound. Simultaneously, processing-in or -near memory (PIM or PNM) is promising unprecedented highbandwidth connection between compute and memory. Fortunately, the memory-bound ML models are a good fit for PIM. We focus on digital PIM which provides higher bandwidth than PNM and does not incur the reliability issues of analog PIM. Previous PIM and PNM approaches advocate full processor cores which do not conform to PIM’s severe area and power constraints. This thesis is composed of three major projects: Newton, activation folding (AcF) and ESPIM. Newton is Sk Hynix’s first accelerator-inmemory (AiMX) product for machine learning, AcF improves the performance of Newton by achieving more compute-row access overlap and ESPIM incorporate sparse neural network models to PIM
History
Degree Type
- Doctor of Philosophy
Department
- Electrical and Computer Engineering
Campus location
- West Lafayette