Control systems and machine learning models need to be designed to be robust against various types of uncertainties. For instance, control for the power grid needs to be robust against both uncertain environmental inputs and adversarial actions (when a part of the system is compromised). Similarly, machine learning models need to be robust against noise in the training data and wrongly-picked features. While there exist general analytical frameworks for such questions, in practice they can still be challenging to study, especially when the space of the adversary’s actions is very large, or when the dependency between the uncertainty and the ultimate performance of the model is difficult to quantify. In my thesis, we have studied three such problems.
In this first problem, we study the robustness of the power grid against both uncertain environmental inputs and adversarial actions. Specifically, we consider distributed voltage control, where the reactive power injection of distributed energy resources (DERs) can be used to regulate the voltage across the power distribution network under uncertain renewable generation. However, enabling such reactive-power injection capability of DERs also opens the door for potential adversarial attacks. Somewhat surprisingly and contrary to the intuition that the reactive power injection at legitimate buses should help mitigating the voltage disruption inflicted by the adversary, we demonstrate that an intelligent attacker can actually exploit the response of the legitimate buses to amplify the damage by two times. Such a higher level of damage can be attained even when the adversary has no information about the network topology. We then formulate an optimization problem to limit the potential damage of such adversarial attacks, such that the voltage control target can be maintained under both uncertain renewable generation and adversarial actions when the fraction of compromised nodes is below a given limit.
In the second and third problems, we study the robustness of machine learning models against noise in the training data and wrongly-picked features under overparameterization. In classical statistical learning, it is well known that, when the model contains too many features, it may risk overfitting the noise in the training data and producing large test error. Somewhat mysteriously, it has been observed that, while modern deep neural networks (DNNs) have such a large number of layers and neurons that they can essentially fit arbitrary functions within a large class, they still produce models with good generalization power. As a first step towards understanding this phenomenon, a recent line of studies has focused on overparameterized linear models and studied their generalization power when the models overfit the training data. Specifically, in the second problem, we studied the min-L1-norm solution of linear models with Gaussian features that overfits. We show that its generalization error can approach the noise level when the number of features p is within a large range, while that of the min-L2-norm solution quickly approaches the "null risk" (i.e., of a model that always predicts zero) as p increases. Nonetheless, the generalization error of both min-L1-norm and min-L2-norm solutions eventually approaches the null risk when p approaches infinity. In the third problem, we study the overfitted neural tangent kernel (NTK) models that have a finite number of neurons, which can be viewed as a useful intermediate step towards modeling nonlinear neural networks and understanding their generalization performance. We show that, depending on the ground-truth function, the test error of overfitted NTK models exhibits characteristics that are different from the "double-descent" of other overparameterized linear models with simple Fourier or Gaussian features. Specifically, for a class of learnable functions, our new upper bound shows that the generalization error of overfitted NTK models approaches a small limiting value, even when the number of neurons p approaches infinity. This limiting value further decreases with the number of training samples n.