Research summary

My research focuses on optimization methods and algorithms for large-scale problems.

Async-parallel computing

Beginning around ten years ago, the single-threaded CPU performance stopped improving significantly, due to physical limitations; it is the numbers of cores in each machine that continue to arise. Today we can buy 8-core phones, 64-core workstations, and 2.5k-core GPUs at affordable prices. On the other hand, most of our algorithms are still single-threaded, and because so, their running time is about the same as it was to ten years ago and will stay so in the future, say, ten or twenty years. To develop faster algorithms, especially for those large-scale problems that arise in signal processing, medical imaging, and machine learning, it is inevitable to consider parallel computing.

In parallel computing, multiple agents (e.g., CPU cores) collaboratively solve a problem by concurrently solving their simpler subproblems. For most, the subproblems depend on each other, so the agents must regularly exchange information. In asynchronous computing, each agent can compute with the information it has, even if the latest information from other agents has not arrived. Asynchronism is extremely important to the efficiency and resilience of parallel computing. Without asynchronism, all cores would have to wait for the arrival of latest information, so the speed of parallel computing would be dictated by the slowest core, the most difficult subproblem, and the longest communication delay. Without asynchronism, the entire parallel computing must stop when an agent or a network link fails and awaits a fix, and such failures will happen more often as the system gets larger. Today, most algorithms are still singled-threaded, and most of the already-parallelized algorithms are still synchronous. In spite of both mathematical and coding challenges, it is imperative to develop the theory and algorithms for asynchronous parallel computing, in order to use all the cores availabe and to use those those as much as possible in future computing platforms.

First-order methods and operator splitting for optimization

First-order methods are described and analyzed with gradients or subgradients, while second-order methods use second-order derivatives or their approximations.

During the 70s–90s the majority of the optimization community focused on second-order methods since they are more efficient for those problems that have the sizes of interest at that time. Beginning around fifteen years ago, however, the demand to solve ever larger problems started growing very quickly. Many large problems are further complicated by non-differentiable functions and constraints. Because simple first-order and classic second-order methods are ineffective or infeasible for these problems, operator splitting methods regained their popularity.

Operators are used to develop algorithms and analyze them for a wide spectrum of problems including optimization problems, variational inequalities, and differential equations. Operator splitting is a set of ideas that generate algorithms through decomposing a problem that is too difficult as a whole into two or more smaller and simpler subproblems. During the decomposition, complicated structures like non-differentiable functions, constraint sets, and distributed problem structures end up in different subproblems and thus can be handled elegantly. We believe ideas from operator splitting provide the most effective means to handle such complicated structures for computational problem sizes of modern interest.

Sparse optimization

What is common in all l1 or l1-like minimization problems mentioned above is that the data is dense; it is the solutions that are sparse. Traditional approaches for solving these problems reduce them to standard optimization problems, which are then solved by algorithms that exploit only data structures such as matrix sparsity. Therefore, the challenge is to develop algorithms that take advantages of the structure of the solutions rather that of the data. To address this challenge, I have been recently working on first-order algorithms specifically designed for solving large-scale optimization problems involving various l1 and l1-like functions. I also worked on new total variation algorithms that yields recovered images of screen resolutions as fast as in seconds. The underlying idea also led to fast algorithms for color image processing and MR image reconstruction. In addition, for iterative thresholding based algorithms, we discovered useful properties such as quick identification of nonzero entries in sparse solutions and the q-linear global convergence. These results are potentially useful in reducing the original problem, before it is fully solved, into one that is much smaller. It was also discovered that the Basis Pursuit problem, a constrained optimization for signal recovery in compressed sensing, can be solved by the Bregmen method, which takes only a small and provably finite number of iterations. A slightly different version of it, called the linearized Bregman method, is more interesting because its intermediate solutions are mostly sparse, making the method especially suitable for certain matrix problems. In this research I expect to develop a set of strategies of exploiting solution structures that can be widely applied to many algorithms.