Math 285J: A First Course on Large-Scale Optimization Methods


  • Instructors: Ernest Ryu and Wotao Yin

  • Every Monday, Wednesday, and Friday 3-3:50pm in MS 5118

  • Lecture notes on

Course description

In this course, we use monotone operator theory to derive and analyze a wide range of classical and modern convex optimization algorithms, including stochastic (randomized), parallel, distributed, and decentralized methods that are well-suited for large-scale and big data problems.

The list of covered algorithms include: gradient descent, proximal-gradient method, method of multipliers, ADMM, linearized ADMM, PDHG (Chambolle-Pock), Condat-Vu, SGD, Finito, SAGA, SVRG, distributed ADMM, distributed gradient descent, and so on.

This course strives for intellectual economy. We analyze a large number of methods in a unified and streamlined manner using a handful of mathematical concepts from monotone operator theory.

Intended audience and pre-requisite

This course should benefit students who already find convex optimization useful and wish to learn more about algorithms.

This course assumes familiarity with calculus and linear algebra at the level of Math 32a and Math 115a. Prior knowledge of optimization is helpful but not necessary. Basic convex optimization will be reviewed in the first week.

This course also assumes proficiency at basic programming. Assignments will involve coding in Python.

What this course is not

This course is not a course on monotone operator theory, although it is an interesting branch of mathematics in its own right. We only use monotone operator theory as a tool to analyze convex optimization algorithms.

This course is not a comprehensive study of the strongest theoretical results. This course presents a unified and streamlined approach to develop and analyze algorithms, and we only discuss results we can obtain through this approach. These results are often not the strongest, but the simplest.

List of topics

(Note: this is not weekly plan.)

  1. Preliminaries

  2. Monotone, nonexpansive, and averaged operators

  3. Fixed-point iteration, resolvent, proximal-point method

  4. Operator splitting schemes

  5. Variable metric methods

  6. Duality and dualization techniques

  7. Primal-dual splitting methods

  8. Stochastic coordinate fixed-point iteration

  9. Asynchronous coordinate fixed-point iteration

  10. Stochastic finite-sum zero finding

  11. Distributed and decentralized optimization

Possible additional topics:

  1. Maximal and cyclic monotone operators

  2. Scaled relative graphs

  3. Quasi-properties

  4. Subgradient and stochastic subgradient methods


  • About eight homework sets; one every week except for two short weeks

  • No midterm, no final exam, no final project

« Back