LASG: Lazily Aggregated Stochastic Gradients for Communication-Efficient Distributed Learning

Tianyi Chen, Yuejiao Sun, and Wotao Yin

Submitted:

Overview

This paper solves distributed machine learning problems such as federated learning in a communication-efficient fashion.

We developthe stochastic generalization to the recently developed lazily aggregated gradient (LAG) method  —  justifying the name LASG. LAG adaptively predicts the contribution of each round of communication and chooses only the significant ones to perform. It saves communication while also maintains the rate of convergence. However, LAG only works with deterministic gradients, and directly applying it to stochastic gradients yields poor performance.

The novel components of LASG are a set of new rules tailored for stochastic gradients that can be implemented either to save download, upload, or both. The new algorithms adaptively choose between fresh and stale stochastic gradients and have convergence rates comparable to the original SGD.

LASG achieves impressive empirical performance  —  it typically saves total communication by an order of magnitude.

 

Citation

T. Chen, Y. Sun, and W. Yin, LASG: Lazily aggregated stochastic gradients for communication-efficient distributed learning, arXiv:2002.11360, 2020.


« Back