From Black Boxes to Verified Mechanisms: Building Trustworthy Machine Learning through Interpretability, Robustness, and Generalization
Benefiting from rapidly developing techniques and the surge in computational power, deep learning (DL) methods have greatly succeeded in various areas, achieving human-level or even far-ahead performances. DL methods are thereby widely utilized in practical scenarios, including high-stake cases such as self-driving vehicles, medical diagnosis, financial choices, etc. However, the exponential growth of the complexity of deep neural networks (DNNs) inevitably leads to the black-box essence of DL models: Humans are unable to comprehend or predict the behaviors of deep models. The intrinsic high non-linearity of deep models also undermines their trustworthiness from humans when handling unseen data.
This thesis addresses the trustworthiness concerns in deep learning by investigating three critical dimensions of trustworthy machine learning (ML): (1) Interpretability aims to reveal models’ internal mechanisms and helps diagnose vulnerabilities. (2) Robustness determines the models’ security against distribution shifts such as malicious attacks, distribution shifts, etc. (3) Generalization aims at achieving consistent performance across diverse, unseen distributions. Robustness and generalization represent the desiderata of reliable ML models and thereby constitute the essential qualities of trustworthiness studies.
This thesis tackles the lack of systematic and theoretical foundations in these rapidly evolving dimensions. For interpretability, we propose self-interpretable methods that encode interpretability power in the models intrinsically. We also conduct rigorous and comprehensive studies on the evaluation and applications of numerous explanation methods and first propose the theoretical and principled framework for deletion metrics. As for robustness, we derive a theoretical framework for subpopulation group shift, the proposed methods improve the fundamental understanding of the distribution shift problem for data-driven models and provide theoretical guidance for remedying the subpopulation shift issue. Regarding the generalization problem, we tackle the mysterious power of model scaling, proposing verifiable convergence hypotheses for model behaviours, and first provide a comprehensive theory of model ensembles to support their “mysterious capabilities”. The studies bridge the gap between the empirical observations and theoretical foundations of model scaling and connect the single-model scaling and deep ensembles’ mechanisms for the first time.
History
Degree Type
- Doctor of Philosophy
Department
- Electrical and Computer Engineering
Campus location
- West Lafayette