Purdue University Graduate School
Browse

Compute-in-Memory Based Machine Learning Inference Accelerators

thesis
posted on 2025-12-01, 18:41 authored by Tanvi SharmaTanvi Sharma
<p dir="ltr">The slowdown of Moore's law and end of Dennard's voltage scaling law, coupled with the increasing computational and memory demands of machine learning (ML) models, necessitate new hardware paradigms beyond traditional architectures. Compute-in-Memory (CIM) aims to address this challenge by performing multiply-accumulate (MAC) operations directly within memory arrays, reducing data movement and improving energy efficiency. However, CIM deployment faces two major challenges. (1) The large and fragmented CIM design space, spanning digital and analog implementations, SRAM and emerging non-volatile memory (eNVM) technologies, and diverse micro-architectures and circuit techniques. The heterogeneity in accuracy, latency, and energy across these axes complicates unified design methodologies and performance generalization. (2) ML workloads are fast-evolving, encompassing feed-forward, convolutional, transformer, and recommender architectures with widely varying compute and memory requirements. These models comprise both GEMM and non-GEMM operations, imposing the need to support more than just general matrix multiplications (GEMM) operations in CIM based accelerators.</p><p dir="ltr">To that effect, we propose a set of co-optimization techniques across device, circuit, and architectural layers for fast evolving ML workloads. To tackle challenge (1), we first present ELSA for systematic optimization of the accuracy-energy trade-off in spin-orbit torque magnetic tunnel junction (SOT-MTJ) based CIM arrays. It showcases the interplay of device-circuit level parameters such as Ron, Roff/Ron and array size, along with the inherent activation and weight or network sparsity on efficient CIM design. Second, WWW explores the design space of SRAM based analog and digital CIM macros when integrated in the on-chip memory hierarchy. Particularly, it devises a greedy mapping approach that maximizes data reuse and applies weight-interleaving to maintain throughput during array under-utilization. Utilizing the efficient mapping, it further explores answers to the questions on what, when and and where to CIM compared to a digital array of compute units under iso-area constraints. </p><p dir="ltr">To address challenge (2), we extend CIM beyond GEMM operations and explore the CIM hardware design space with different buffer technologies. The third chapter introduces HASTILY, a HW-SW co-design approach for transformer inference. By embedding exponent lookup tables in CIM arrays and parallelizing reductions across cores, HASTILY accelerates softmax computation by up to 17 times compared to standard CIM architecture. Building on this concept, MemRaptor leverages MRAM-based CIM to evaluate transcendental functions such as sigmoid and tanh, achieving up to 30% throughput improvements for LSTM networks. Finally, TADA provides a technology-aware design space exploration framework for deep neural network accelerators. It is based on a gradient boosting framework called LightGBM for predicting system-level energy and latency for digital CIM-based accelerators with different buffer technologies. To make it generalizable, we train it on synthetically generated dataset and incorporating technology parameters (read energy, write energy and capacity) in the model input. TADA predicts performance for a given transformer- buffer technology-hardware configuration with an average correlation score of 0.999 and 0.956 on synthetic dataset and unseen real dataset, respectively. .</p><p dir="ltr">Collectively, ELSA, WWW, HASTILY, MemRaptor, and TADA provide a set of methods for efficiently designing CIM based accelerators for ML inference through accuracy-energy optimization, hardware-software co-design, and buffer technology-aware exploration.</p>

Funding

Qualcomm Innovation Fellowship

Center for Brain-Inspired Computing (CBRIC)

Center for the Co-Design of Cognitive Systems (CoCoSys)

History

Degree Type

  • Doctor of Philosophy

Department

  • Electrical and Computer Engineering

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Kaushik Roy

Additional Committee Member 2

Anand Raghunathan

Additional Committee Member 3

Sumeet Gupta

Additional Committee Member 4

Timothy Rogers

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC