On the Neural Representation for Adversarial Attack and Defense
Neural representations are high-dimensional embeddings generated during the feed-forward process of neural networks. These embeddings compress raw input information and extract abstract features beneficial for downstream tasks. However, effectively utilizing these representations poses challenges due to their inherent complexity. This complexity arises from the non-linear relationship between inputs and neural representations, as well as the diversity of the learning process.
In this thesis, we propose effective methods to utilize neural representations for adversarial attack and defense. Our approach generally involves decomposing complex neural representations into smaller, more analyzable parts. We also seek general patterns emerging during learning to better understand the semantic meaning associated with neural representations.
We demonstrate that formalizing neural representations can reveal models' weaknesses and aid in defending against poison attacks. Specifically, we define a new type of adversarial attack using neural style, a special component of neural representation. This new attack uncovers novel aspects of the models' vulnerabilities.
Furthermore, we develop an interpretation of neural representations by approximating their marginal distribution, treating intermediate neurons as feature indicators. By properly harnessing these rich feature indicators, we address scalability and imperceptibility issues related to pixel-wise bounds.
Finally, we discover that neural representations contain crucial information about how neural networks make decisions. Leveraging the general patterns in neural representations, we design algorithms to remove unwanted and harmful functionalities from neural networks, thereby mitigating poison attacks.
History
Degree Type
- Doctor of Philosophy
Department
- Computer Science
Campus location
- West Lafayette