A Sharp Convergence Rate Analysis for Distributed Accelerated Gradient Methods

Huan Li, Cong Fang, Wotao Yin, Zhouchen Lin

Submitted

Overview

The underlying problem is defined on a network with m nodes. We must have an algorithm for these m nodes to cooperatively solve the convex program:

 mathrm{minimize}f_1(x)+f_2(x)+cdots+f_m(x).

When minimizing a convex function, the performance of gradient descent is affected by the smoothness constant L and strong-convexity constant mu of the function, as well as the target accuracy epsilon. When we solve the problem over a network, the connectivity of the network also directly affects the performance because the nodes share their local information with their adjacent nodes (network neighbors) through the network. This information exchange is described by a “gossip” matrix W. Its second largest singular value sigma_2(W) is a good indication of the network connectivity (sigma_1(W) always equals 1).

Like minimizing a convex function, there is a minimal number of gradient evaluations that any gradient algorithm can guarantee to solve each instance in a problem class (described by mu,L) to reach accuracy epsilon. The number is called the lower bound of computation. For decentralized algorithms, there is also lower bound of communication rounds.

In short, this paper presents two new algorithms that achieve both (computation and communication) lower bounds up to an extra logarithm factor.

 

The references in these tables can be found in our report. [2] is Scaman et al.’17, [3] is Uribe et al’18, [24] is Qu-Li’17, and [30] is Nesterov’13.

We implemented our methods and tested them on common decentralized problems. Acceleration was very well reflected by numerical results. Theory and practice have a good match here.

Citation

H. Li, C. Fang, W. Yin, Z. Lin, A sharp convergence rate analysis for distributed accelerated gradient methods, Arxiv:1810.01053, 2018.


« Back