Purdue University Graduate School
Browse

File(s) under embargo

11

month(s)

2

day(s)

until file(s) become available

HIGH-DIMENSIONAL INFERENCE OVER NETWORKS: STATISTICAL AND COMPUTATIONAL GUARANTEES

thesis
posted on 2024-09-19, 12:55 authored by Yao JiYao Ji

Distributed optimization problems defined over mesh networks are ubiquitous in signal processing, machine learning, and control. In contrast to centralized approaches where all information and computation resources are available at a centralized server, agents on a distributed system can only use locally available information. As a result, efforts have been put into the design of efficient distributed algorithms that take into account the communication constraints and make coordinated decisions in a fully distributed manner from a pure optimization perspective. Given the massive sample size and high-dimensionality generated by distributed systems such as social media, sensor networks, and cloud-based databases, it is essential to understand the statistical and computational guarantees of distributed algorithms to solve such high-dimensional problems over a mesh network.

A goal of this thesis is a first attempt at studying the behavior of distributed methods in the high-dimensional regime. It consists of two parts: (I) distributed LASSO and (II) distributed stochastic sparse recovery.

In Part (I), we start by studying linear regression from data distributed over a network of agents (with no master node) by means of LASSO estimation, in high-dimension, which allows the ambient dimension to grow faster than the sample size. While there is a vast literature of distributed algorithms applicable to the problem, statistical and computational guarantees of most of them remain unclear in high dimensions. This thesis provides a first statistical study of the Distributed Gradient Descent (DGD) in the Adapt-Then-Combine (ATC) form. Our theory shows that, under standard notions of restricted strong convexity and smoothness of the loss functions--which hold with high probability for standard data generation models--suitable conditions on the network connectivity and algorithm tuning, DGD-ATC converges globally at a linear rate to an estimate that is within the centralized statistical precision of the model. In the worst-case scenario, the total number of communications to statistical optimality grows logarithmically with the ambient dimension, which improves on the communication complexity of DGD in the Combine-Then-Adapt (CTA) form, scaling linearly with the dimension. This reveals that mixing gradient information among agents, as DGD-ATC does, is critical in high-dimensions to obtain favorable rate scalings.

In Part (II), we focus on addressing the problem of distributed stochastic sparse recovery through stochastic optimization. We develop and analyze stochastic optimization algorithms for problems over a network, modeled as an undirected graph (with no centralized node), where the expected loss is strongly convex with respect to the Euclidean norm, and the optimum is sparse. Assuming agents only have access to unbiased estimates of the gradients of the underlying expected objective, and stochastic gradients are sub-Gaussian, we use distributed stochastic dual averaging (DSDA) as a building block to develop a fully decentralized restarting procedure for recovery of sparse solutions over a network. We show that with high probability, the iterates generated by all agents linearly converge to an approximate solution, eliminating fast the initial error; and then converge sublinearly to the exact sparse solution in the steady-state stages owing to observation noise. The algorithm asymptotically achieves the optimal convergence rate and favorable dimension dependence enjoyed by a non-Euclidean centralized scheme. Further, we precisely identify its non-asymptotic convergence rate as a function of characteristics of the objective functions and the network, and we characterize the transient time needed for the algorithm to approach the optimal rate of convergence. We illustrate the performance of the algorithm in application to classical problems of sparse linear regression, sparse logistic regression and low rank matrix recovery. Numerical experiments demonstrate the tightness of the theoretical results.

History

Degree Type

  • Doctor of Philosophy

Department

  • Industrial Engineering

Campus location

  • West Lafayette

Advisor/Supervisor/Committee Chair

Gesualdo Scutari

Advisor/Supervisor/Committee co-chair

Harsha Honnappa

Additional Committee Member 2

Raghu Pasupathy

Additional Committee Member 3

Alex L. Wang