DISTRIBUTED OPTIMIZATION FOR MACHINE LEARNING: GUARANTEES AND TRADEOFFS
thesisposted on 01.09.2021, 00:33 by Ye TianYe Tian
In the era of big data, the sheer volume and widespread spatial distribution of information has been promoting extensive research on distributed optimization over networks. Each computing unit has access only to a relatively small portion of the entire data and can only communicate with a relatively small number of neighbors. The goal of the system is to reach consensus on the optimal parametric model with respect to the entire data among all computing units. Existing work has provided various decentralized optimization algorithms for the purpose. However, some important questions remain unclear: (I) what is the intrinsic connection among different existing algorithms? (II) what is the min-max lower complexity bound for decentralized algorithms? Can one design an optimal decentralized algorithm in the sense that it achieves the lower complexity bound? and (III) in the presence of asynchrony and imperfect communications, can one design linearly convergent decentralized algorithms?
This thesis aims at addressing the above questions. (I) Abstracting from ad-hoc, specific solution methods, we propose a unified distributed algorithmic framework and analysis for a general class of optimization problems over networks. Our method encapsulates several existing first-order distributed algorithms. Distinguishing features of our scheme are: (a) When each of the agent’s functions is strongly convex, the algorithm converges at a linear rate, whose dependence on the agents’ functions and network topology is decoupled; (b) When the objective function is convex, but not strongly convex, similar decoupling as in (a) is established for the coefficient of the proved sublinear rate. This also reveals the role of function heterogeneity on the convergence rate; (c) The algorithm can adjust the ratio between the number of communications and computations to achieve a rate (in terms of computations) independent on the network connectivity; and (d) A by-product of our analysis is a tuning recommendation for several existing (non-accelerated) distributed algorithms, yielding provably faster (worst-case) convergence rate for the class of problems under consideration. (II) Referring to lower complexity bounds, the proposed novel family of algorithms, when equipped with acceleration, are proved to be optimal, that is, they achieve convergence rate lower bounds. (III) Finally, to make the proposed algorithms practical, we break the synchronism in the agents’ updates: agents wake up and update without any coordination, using information only from immediate neighbors with unknown, arbitrary but bounded delays. Quite remarkably, even in the presence of asynchrony, the proposed algorithmic framework is proved to converge at a linear rate (resp. sublinear rate) when applied to strongly convex (resp. non strongly convex) optimization problems.