ALISTA: Analytic Weights Are as Good as Learned Weights in LISTA

Jialin Liu, Xiaohan Chen, Zhangyang Wang, Wotao Yin

International Conference on Learning Representations (ICLR)’19


The goal of work is to significantly speed up sparse recovery. The basis of this line of work is ISTA (iterative soft-thresholding algorithm), also known as proximal-gradient method or FPC (fixed-point continuation method), a classic iterative method for recovering a sparse vector x from it linear measurements Dx. The linear measures are usually contaminated by additive noise. Like most iterative methods, ISTA repeats the same operation (matrix-vector multiplications by D and D’ and a soft thresholding) at each iteration. Therefore, it can be written as a simple for-loop. However, depending on the problem condition, it can take hundreds of iterations or tens of thousands of iterations.

Gregor-LeCun, 2010, instead of using the original matrices D and D’ and soft-thresholding scalars in ISTA, select a series of new matrices and scalars by training using a set of synthetic sparse signals and their linear measurements. The resulting method, called LISTA (or Learned ISTA), has a small fixed number of iterations, roughly 20, and is not only much faster but recovers more accurate sparse vectors than ISTA even if ISTA runs order-of-magnitude more iterations. On the other hand, training LISTA takes a very long time, typically ten hours and longer, much like training a neural network with lots of parameters. Also, one must train new matrices and scalars for each encoding matrix D. These shortcomings are addressed by a line of work that follows LISTA.

This paper introduces ALISTA, which significantly simplifies LISTA by using only one free matrix (besides the encoding matrix D) for all iterations, and pre-computing that matrix by analytic optimization, as opposed to data-driven training. Therefore, when it comes to training ALISTA, there remain only a series of scalars for thresholding and step sizes to be learned from synthetic data. Despite this huge simplification, the “inference” performance of ALISTA is as great as LISTA and other work along the line, supported by our theoretical results and numerical verification.

Table: Number of parameters to learn. K: layers/iterations; MxN: size of D
LISTA (Gregor-LeCun’10) Coulpled LISTA (Chen et al’18) Tied LISTA (Chen et al’18) This paper
O(KM^2+K+MN) O(KNM+K) O(NM+K) O(K)

Theoretically, we show our ALISTA retains the optimal linear convergence proved in Chen et al’18.

We extend ALISTA to using convolutional linear operators, again determined in a data-free manner. They are very useful in imaging applications.

We want a trained algorithm to work well for not just one D but all that are small perturbations to D so that re-training is not needed. This is achieved by a feed-forward framework that combines the data-free optimization and ALISTA from end to end.


J. Liu, X. Chen, Z. Wang, W. Yin, ALISTA: analytic weights are as good as learned weights in LISTA, international Conference on Learning Representations (ICLR), 2019.

« Back