Purdue University Graduate School
Browse

<b>Mechanistic Insights into Deep Neural Networks: From Feature Learning to Interpretable Circuits</b>

thesis
posted on 2025-07-17, 18:11 authored by Guan Zhe HongGuan Zhe Hong
<p dir="ltr">Modern deep neural networks excel at vision and language tasks, yet we still lack a principled account of <i>why </i>they exhibit generalizing cognitive abilities, and <i>how </i>they implement such abilities. This thesis advances our understanding from a <i>mechanistic </i>perspective. We restrict attention to a modest set of concrete, well-defined problem settings that are analytically tractable yet scientifically meaningful.</p><p dir="ltr"><b>Generalization (Chapters 2 & 3)</b>. We mathematically analyze how properties of the data distribution and training pipeline influence the learnt features of neural networks, and the consequences on model generalization.</p><p dir="ltr">(i) We develop theory that explains a common practice in computer vision: pre-training models with <i>fine-grained</i> labels. By proposing a <i>hierarchical multi-view</i> data model and analyzing the learning dynamics induced by gradient descent, we show that finer labels tend to yield richer representations and, in turn, lower downstream error. (ii) In parallel, we analyze<i> feature-based student-teacher learning</i>, proving that early stopping prevents the student's features from <i>collapsing</i> to the teacher's features, and that teachers with lower-complexity features induce better transfer of knowledge, both strategies yielding more generalizing student networks.</p><p dir="ltr"><b>Interpretability (Chapters 4 & 5). </b>Towards uncovering how large language models (LLMs) reason, we construct two synthetic testbeds: <i>propositional-logic</i> proof generation, and <i>multi-hop in-context </i>question answering. Each task is designed to prevent superficial pattern-matching, while remaining accessible to precise causal interventions. (i) For the logic problems, for multiple LLMs (up to 27B in size), we were able to localize sparse sets of attention heads – specializing in rule retrieval, fact routing, and decision making – that are <i>jointly necessary and sufficient</i> for correct proofs. The cross-model circuit similarity provides early evidence of their universality. (ii) On the multi-hop in-context learning tasks, we find a favorable scaling trend: a well-performing 27B-parameter Gemma model tends to disentangle and compose latent concepts step-by-step, whereas its 2-billion-parameter counterpart (with much lower accuracies) relies on noisy, entangled pathways.</p><p dir="ltr">Collectively, these results indicate that strong performance in modern DNNs can stem from identifiable, interpretable mechanisms rather than opaque heuristics. They pave way for practical guidelines for choosing training regimes that generalize, and for locating the specific model components that implement abstract cognitive abilities.</p>

History

Degree Type

  • Doctor of Philosophy

Department

  • Electrical and Computer Engineering

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Stanley Chan

Additional Committee Member 2

Gregery T. Buzzard

Additional Committee Member 3

David I. Inouye

Additional Committee Member 4

Jing Gao

Usage metrics

    Categories

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC