# Algorithmic and Graph-Theoretic Approaches for Optimal Sensor Selection in Large-Scale Systems

Using sensor measurements to estimate the states and parameters of a system is a fundamental task in understanding the behavior of the system. Moreover, as modern systems grow rapidly in scale and complexity, it is not always possible to deploy sensors to measure all of the states and parameters of the system, due to cost and physical constraints. Therefore, selecting an optimal subset of all the candidate sensors to deploy and gather measurements of the system is an important and challenging problem. In addition, the systems may be targeted by external attackers who attempt to remove or destroy the deployed sensors. This further motivates the formulation of resilient sensor selection strategies. In this thesis, we address the sensor selection problem under different settings as follows.

First, we consider the optimal sensor selection problem for linear dynamical systems with stochastic inputs, where the Kalman filter is applied based on the sensor measurements to give an estimate of the system states. The goal is to select a subset of sensors under certain budget constraints such that the trace of the steady-state error covariance of the Kalman filter with the selected sensors is minimized. We characterize the complexity of this problem by showing that the Kalman filtering sensor selection problem is NP-hard and cannot be approximated within any constant factor in polynomial time for general systems. We then consider the optimal sensor attack problem for Kalman filtering. The Kalman filtering sensor attack problem is to attack a subset of selected sensors under certain budget constraints in order to maximize the trace of the steady-state error covariance of the Kalman filter with sensors after the attack. We show that the same results as the Kalman filtering sensor selection problem also hold for the Kalman filtering sensor attack problem. Having shown that the general sensor selection and sensor attack problems for Kalman filtering are hard to solve, our next step is to consider special classes of the general problems. Specifically, we consider the underlying directed network corresponding to a linear dynamical system and investigate the case when there is a single node of the network that is affected by a stochastic input. In this setting, we show that the corresponding sensor selection and sensor attack problems for Kalman filtering can be solved in polynomial time. We further study the resilient sensor selection problem for Kalman filtering, where the problem is to find a sensor selection strategy under sensor selection budget constraints such that the trace of the steady-state error covariance of the Kalman filter is minimized after an adversary removes some of the deployed sensors. We show that the resilient sensor selection problem for Kalman filtering is NP-hard, and provide a pseudo-polynomial-time algorithm to solve it optimally.

Next, we consider the sensor selection problem for binary hypothesis testing. The problem is to select a subset of sensors under certain budget constraints such that a certain metric of the Neyman-Pearson (resp., Bayesian) detector corresponding to the selected sensors is optimized. We show that this problem is NP-hard if the objective is to minimize the miss probability (resp., error probability) of the Neyman-Pearson (resp., Bayesian) detector. We then consider three optimization objectives based on the Kullback-Leibler distance, J-Divergence and Bhattacharyya distance, respectively, in the hypothesis testing sensor selection problem, and provide performance bounds on greedy algorithms when applied to the sensor selection problem associated with these optimization objectives.

Moving beyond the binary hypothesis setting, we also consider the setting where the true state of the world comes from a set that can have cardinality greater than two. A Bayesian approach is then used to learn the true state of the world based on the data streams provided by the data sources. We formulate the Bayesian learning data source selection problem under this setting, where the goal is to minimize the cost spent on the data sources such that the learning error is within a certain range. We show that the Bayesian learning data source selection is also NP-hard, and provide greedy algorithms with performance guarantees.

Finally, in light of the COVID-19 pandemic, we study the parameter estimation measurement selection problem for epidemics spreading in networks. Here, the measurements (with certain costs) are collected by conducting virus and antibody tests on the individuals in the epidemic spread network. The goal of the problem is then to optimally estimate the parameters (i.e., the infection rate and the recovery rate of the virus) in the epidemic spread network, while satisfying the budget constraint on collecting the measurements. Again, we show that the measurement selection problem is NP-hard, and provide approximation algorithms with performance guarantees.