Block Stochastic Gradient Iteration for Convex and Nonconvex Optimization

Yangyang Xu and Wotao Yin

Published in SIAM Journal on Optimization

Overview

The stochastic gradient (SG) method can quickly solve a problem with a large number of components in the objective, or a stochastic optimization problem, to a moderate accuracy. The block coordinate descent (BCD) method, on the other hand, can quickly solve problems with multiple (blocks of) variables. This paper introduces a method that combines the great features of SG and BCD for problems with many components in the objective and with multiple (blocks of) variables.

This paper proposes a block stochastic gradient (BSG) method for both convex and nonconvex programs. BSG generalizes SG by updating all the blocks of variables in the Gauss-Seidel type (updating the current block depends on the previously updated block), in either a fixed or randomly shuffled order. Although BSG has slightly more work at each iteration, it typically outperforms SG because BSG uses Gauss-Seidel updates and larger stepsizes, the latter of which are results the smaller per-block Lipschitz constants.

The convergence of BSG is established for both convex and nonconvex cases. In the convex case, BSG has the same order of convergence rate as SG. In the nonconvex case, its convergence is established in terms of the expected violation of a first-order optimality condition. In both cases our analysis is nontrivial since the typical unbiasedness assumption no longer holds.

BSG is numerically evaluated on the following problems:

  • (convex) stochastic least squares,

  • (convex) logistic regression,

  • (nonconvex) low-rank tensor recovery,

  • (nonconvex) bilinear logistic regression.

On the convex problems, BSG performed significantly better than SG. On the nonconvex problems, BSG significantly outperformed the deterministic BCD method because the latter tends to early stagnate near local minimizers. Overall, BSG inherits the benefits of both stochastic gradient approximation and block-coordinate updates and is especially useful for solving large-scale nonconvex problems.

 

Citation

Y. Xu and W. Yin, Block stochastic gradient iteration for convex and nonconvex optimization, SIAM Journal on Optimization, 25(3), 1686-1716, 2015. DOI: 10.1137/140983938


« Back