LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning

Tianyi Chen, Georgios B. Giannakis, Tao Sun, and Wotao Yin

Submitted to NIPS’18:

Overview

This paper presents a new class of gradient methods for distributed machine learning that adaptively skip the gradient calculations to learn with reduced communication and computation. Simple rules are designed to detect slowly-varying gradients and, therefore, trigger the reuse of outdated gradients. The resultant gradient-based algorithms are termed Lazily Aggregated Gradient or LAG.

Theoretically, the merits of this contribution are:

  1. the convergence rate is the same as batch gradient descent in strongly-convex, convex, and nonconvex smooth cases;

  2. if the distributed datasets are heterogeneous (quantified by certain measurable constants), the communication rounds needed to achieve a targeted accuracy are reduced thanks to the adaptive reuse of lagged gradients.

Numerical experiments on both synthetic and real data corroborate a significant communication reduction compared to alternatives.

Citation

T. Chen, G. Giannakis, T. Sun, and W. Yin, LAG: Lazily aggregated gradient for communication-efficient distributed learning, arXiv 1805.09965, 2018.


« Back