Linear stationary iterative methods

Let $A \in \R^{n \times n}$ be invertible and $b \in \R^n$. An iterative method for solving $Ax = b$ generates a sequence of iterates $(x^{(k)})_{k=1}^\infty$ approximating the exact solution $x^*$, given an initial guess $x^{(0)}$. We can express this as $x^{(k+1)} := \phi_{k+1}(x^{(0)}, \dots, x^{(k)}; A, b)$ for some functions $\phi_k$; if these functions are eventually independent of $k$, the method is said to be stationary.

We will consider stationary iterative methods of the form $x^{(k+1)} := G x^{(k)} + f$, called (first-degree) linear stationary iterative methods. Such methods are typically derived from a splitting $A = M - N$, where $M$ is an invertible matrix that “approximates” $A$ but is easier to solve linear systems with. Specifically, since $Mx^* = Nx^* + b$, we take $G := M^{-1} N = I - M^{-1} A$ and $f = M^{-1} b$ so that the exact solution is a fixed point of the iteration. Equivalently, we can view $x^{(k+1)} = x^{(k)} + M^{-1} (b - Ax^{(k)})$ as a correction of $x^{(k)}$ based on the residual $r^{(k)} := b - Ax^{(k)}$. We also note that the error $e^{(k)} := x^* - x^{(k)}$ satisfies $e^{(k+1)} = Ge^{(k)}$, so the convergence of a splitting method depends on the properties of its iteration matrix $G$.

Splitting methods

Let $L$, $D$, and $U$ denote the strictly lower, diagonal, and strictly upper triangular parts of $A$. The splittings for some basic linear stationary iterative methods are as follows.

Method	$M$
Jacobi	$D$
$\omega$-Jacobi ( $\omega \neq 0$)	$\frac{1}{\omega} D$
Gauss–Seidel	$L + D$
$\omega$-Gauss–Seidel/successive overrelaxation (SOR) ( $\omega \neq 0$)	$L + \frac{1}{\omega} D$
Symmetric successive overrelaxation (SSOR) ( $\omega \neq 0, 2$)	$\frac{\omega}{2-\omega}(L + \frac{1}{\omega} D) D^{-1} (U + \frac{1}{\omega} D)$
Richardson ( $\alpha \neq 0$)	$\frac{1}{\alpha} I$

The parameter $\omega$ is known as the relaxation/damping parameter and arises from taking $x^{(k+1)} = (1-\omega) x^{(k)} + \omega \hat{x}^{(k+1)}$, where $\hat{x}^{(k+1)}$ denotes the result of applying the corresponding nonparametrized ( $\omega = 1$) method to $x^{(k)}$. If $\omega < 1$, the method is said to be underrelaxed/underdamped; if $\omega > 1$, it is said to be overrelaxed/overdamped.

The SSOR method arises from performing a “forward” $\omega$-Gauss–Seidel step with $M = L + \frac{1}{\omega} D$ followed by a “backward” $\omega$-Gauss–Seidel step with $M = U + \frac{1}{\omega} D$.

Convergence theorems

Clearly, since $e^{(k)} = G^k e^{(0)}$, if $\norm{G} < 1$ for some operator norm, then the method converges in the sense that $x^{(k)} \to x^*$ for all $x^{(0)}$.¹ More generally, we see that the method is convergent if and only if $G^k \to 0$, which in turn depends on the spectral radius $\rho(G)$ of $G$, the maximum of the absolute values of its eigenvalues when regarded as a complex matrix.²

Namely, if $(\lambda, v)$ is an eigenpair of $G$ with $\norm{v} = 1$, then $\abs{\lambda}^k = \abs{\lambda^k} = \norm{G^k v} \leq \norm{G^k}$ for the induced operator norm, so $\rho(G)^k = \rho(G^k) \leq \norm{G^k}$. Thus, if $G^k \to 0$, then $\rho(G) < 1$. In fact, the converse is also true.

Let $G \in \C^{n \times n}$. Then $G^k \to 0$ if and only if $\rho(G) < 1$ (such a matrix is called convergent).

Proof. It remains to show that $G^k \to 0$ if $\rho(G) < 1$. Let $UTU^*$ be a Schur factorization of $G$ and let $D$ and $N$ denote the diagonal and strictly upper triangular parts of $T$. Since the product of a diagonal matrix and a strictly upper triangular matrix is strictly upper triangular, and the product of $n$ (or more) strictly upper triangular $n \times n$ matrices is zero, for all $k \geq n$, we have $$ \norm{G^k}_2 = \norm{(D+N)^k}_2 \leq \sum_{j=0}^{n-1} \binom{k}{j} \norm{D}_2^{k-j} \norm{N}_2^j = \sum_{j=0}^{n-1} \binom{k}{j} \rho(G)^{k-j} \norm{N}_2^j \to 0. \quad \blacksquare $$ Using this fact, we can also prove a well-known formula for the spectral radius.³

Gelfand’s formula

Let $G \in \C^{n \times n}$ and $\norm{{}\cdot{}}$ be an operator norm. Then $\rho(G) = \lim_{k \to \infty} \norm{G^k}^{1/k}$.

Proof. We previously saw that $\rho(G) \leq \norm{G^k}^{1/k}$ for all $k$, so $\rho(G) \leq \liminf_{k \to \infty} \norm{G^k}^{1/k}$. On the other hand, if $\varepsilon > 0$ is arbitrary and $\hat{G} := \frac{G}{\rho(G) + \varepsilon}$, then $\rho(\hat{G}) < 1$, so by the preceding result, $\norm{\hat{G}^k} \leq 1$ for all sufficiently large $k$, which is to say that $\norm{G^k} \leq (\rho(G) + \varepsilon)^k$. Hence we also have $\limsup_{k \to \infty} \norm{G^k}^{1/k} \leq \rho(G) + \varepsilon$. ∎

As $\norm{e^{(k)}} \leq \norm{G^k} \norm{e^{(0)}} \approx \rho(G)^k \norm{e^{(0)}}$ for large $k$, the rate of convergence can often be estimated using the spectral radius of the iteration matrix. More precisely, a direct computation shows that if $G$ is diagonalizable and has a unique dominant eigenvalue, then $\frac{\norm{e^{(k+1)}}}{\norm{e^{(k)}}} \sim \rho(G)$, provided that $e^{(0)}$ has a nonzero component in the corresponding eigenspace.

General matrices

If the Jacobi method converges, then the $\omega$-Jacobi method converges for $\omega \in (0, 1]$.

Proof. For the $\omega$-Jacobi method, we have $G = (1-\omega) I + \omega G_\mathrm{J}$, where $G_\mathrm{J}$ denotes the iteration matrix for the Jacobi method. Hence $\rho(G) \leq (1-\omega) + \omega \rho(G_\mathrm{J}) < 1$. ∎

If the $\omega$-Gauss–Seidel method converges, then $\omega \in (0, 2)$.

Proof. For the $\omega$-Gauss–Seidel method, we have $G = (L + \frac{1}{\omega} D)^{-1} ((\frac{1}{\omega} - 1) D - U)$, so $\det(G) = \det(\frac{1}{\omega} D)^{-1} \det((\frac{1}{\omega} - 1) D) = \det((1-\omega) I) = (1-\omega)^n$. On the other hand, the determinant of $G$ is the product of its eigenvalues, so $\abs{1-\omega}^n \leq \rho(G)^n < 1$. ∎

If the SSOR method converges, then $\omega \in (0, 2)$.

Proof. For the SSOR method, we have $G = G_b G_f$, where $G_f$ and $G_b$ denote the iteration matrices for the forward and backward $\omega$-Gauss–Seidel methods. Arguing as in the preceding proof, we obtain $\abs{1-\omega}^{2n} \leq \rho(G)^n < 1$.

Symmetric positive definite matrices

If $A$ is symmetric positive definite (SPD), then $A$ is invertible and all the splitting methods above are applicable since its diagonal entries must be positive. Recall also that symmetric matrices are partially ordered by the Loewner order $\prec$ in which $A \prec B$ if and only if $B-A$ is SPD, and that an SPD matrix $A$ defines an inner product $\inner{x}{y}_A := \inner{Ax}{y}_2$.

If $A$ is SPD and $A \prec M + M^\trans$, then $\norm{G}_A < 1$.

Proof. Let $x$ be a vector with $\norm{x}_A = 1$ such that $\norm{G}_A = \norm{Gx}_A$ (which exists by the extreme value theorem) and let $y := M^{-1} Ax$. Then $$ \begin{align*} \norm{G}_A^2 &= \norm{x-y}_A^2 \\ &= 1 - \inner{x}{y}_A - \inner{y}{x}_A + \inner{y}{y}_A \\ &= 1 - \inner{(M + M^\trans - A)y}{y}_2 < 1. \quad \blacksquare \end{align*} $$

Convergence for SPD matrices

Let $A$ be SPD.

If $A \prec \frac{2}{\omega} D$, then the $\omega$-Jacobi method converges.

If $\omega \in (0, 2)$, then the $\omega$-Gauss–Seidel method converges.

If $\omega \in (0, 2)$, then the SSOR method converges.

If $\alpha \in (0, \frac{2}{\rho(A)})$, then the Richardson method converges. Moreover, if the eigenvalues of $A$ are $\lambda_1 \geq \cdots \geq \lambda_n > 0$, then $\rho(G)$ is minimized when $\alpha = \alpha^* := \frac{2}{\lambda_1 + \lambda_n}$, in which case $\norm{e^{(k+1)}}_2 \leq (1 - \frac{2}{\kappa + 1}) \norm{e^{(k)}}_2$, where $\kappa$ is the 2-norm condition number of $A$.

Proof. The convergence statements follow immediately from the preceding result. For the Richardson method, $\rho(G(\alpha)) = \max_i \, \abs{1 - \alpha \lambda_i} = \max \, \set{1-\alpha\lambda_n, \alpha \lambda_1 - 1}$, so $\rho(G(\alpha)) = 1-\alpha\lambda_n \geq \rho(G(\alpha^*))$ when $\alpha \leq \alpha^*$ and $\rho(G(\alpha)) = \alpha\lambda_1 - 1 \geq \rho(G(\alpha^*))$ when $\alpha \geq \alpha^*$. Finally, since $G$ and $A$ are normal, we have $\norm{G(\alpha^*)}_2 = \rho(G(\alpha^*)) = 1 - \frac{2}{\kappa + 1}$. ∎

Diagonally dominant matrices

$A$ is weakly diagonally dominant (WDD) if every row $i$ is WDD: $\abs{a_{ii}} \geq \sum_{j \neq i} \abs{a_{ij}}$.
$A$ is strictly diagonally dominant (SDD) if every row $i$ is SDD: $\abs{a_{ii}} > \sum_{j \neq i} \abs{a_{ij}}$.
The directed graph of $A$ is $\mathcal{G}_A = (V, E)$ with $V = \set{1, \dots, n}$ and $(i, j) \in E$ if and only if $a_{ij} \neq 0$.
- $A$ is irreducible if $\mathcal{G}_A$ is strongly connected.
$A$ is irreducibly diagonally dominant (IDD) if it is irreducible, WDD, and some row is SDD.
$A$ is weakly chained diagonally dominant (WCDD) if it is WDD and for every row $i$, there exists an SDD row $j$ with a path from $i$ to $j$ in $\mathcal{G}_A$. (Thus, SDD and IDD matrices are both WCDD.)

If $A$ is WCDD, then $A$ is invertible.

Proof. Suppose for the sake of contradiction that there exists an $x \in \ker(A)$ with $\norm{x}_\infty = 1$, and let $i_1$ be such that $\abs{x_{i_1}} = 1$. Let $(x_{i_1}, \dots, x_{i_k})$ be a path in $\mathcal{G}_A$ such that row $i_k$ is SDD. Since $\sum_j a_{i_1 j} x_j = 0$, we have $$ \abs{a_{i_1 i_1}} = \abs{-a_{i_1 i_1} x_{i_1}} \leq \sum_{j \neq i_1} \abs{a_{i_1 j}} \abs{x_j} \leq \sum_{j \neq i_1} \abs{a_{i_1 j}}, $$ so row $i_1$ is not SDD. However, since it is WDD, equality must hold throughout, which implies that $\abs{x_{i_2}} = 1$ because $a_{i_1 i_2} \neq 0$. Iterating this argument, we ultimately deduce that row $i_k$ is not SDD, which is a contradiction. ∎

As a result, if $A$ is WCDD, then all the splitting methods above are applicable since its diagonal entries must be nonzero (otherwise, it would have a zero row and fail to be invertible).

We also note that this immediately implies the Levy–Desplanques theorem: if $A$ is SDD, then $A$ is invertible. This, in turn, is equivalent to the Gershgorin circle theorem: if $\lambda$ is an eigenvalue of $A$, then $\abs{\lambda - a_{ii}} \leq \sum_{j \neq i} \abs{a_{ij}} =: r_i$ for some $i$ (in other words, $\lambda \in B_{r_i}(a_{ii})$ for some $i$). Similarly, if $A$ is IDD, then $A$ is invertible; or equivalently, if $A$ is irreducible and $\lambda$ is an eigenvalue of $A$ such that $\abs{\lambda - a_{ii}} \geq r_i$ for every $i$, then $\abs{\lambda - a_{ii}} = r_i$ for every $i$.

Convergence for WCDD matrices

If $A$ is WCDD, then the $\omega$-Jacobi and $\omega$-Gauss–Seidel methods converge for $\omega \in (0, 1]$.

Proof. If $\abs{\lambda} \geq 1$, then $\abs{\frac{\lambda - 1}{\omega} + 1} \geq \abs{\lambda}$, so $(\lambda - 1) M + A$ has the same WDD/SDD rows and directed graph as $A$. Hence $\lambda I - G = M^{-1} ((\lambda - 1) M + A)$ is invertible for such $\lambda$, so $\rho(G) < 1$. ∎

Consistently ordered matrices

In this section, we assume that the diagonal entries of $A$ are nonzero and consider the iteration matrices $G_\mathrm{J} := -D^{-1} (L+U)$ and $G_\mathrm{GS}(\omega) := (L + \frac{1}{\omega} D)^{-1}(\frac{1-\omega}{\omega} D - U)$ for the Jacobi and $\omega$-Gauss–Seidel methods, respectively.

$A$ has property $\mathrm{A}_{q, r}$ ( $q, r \geq 1$) if there exists a partition $\set{S_k}_{k=1}^p$ of $\set{1, \dots, n}$ with $p = q + r$ such that if $a_{ij} \neq 0$ for some $i \neq j$ and $i \in S_k$, then either $i \in S_1 \cup \cdots \cup S_q$ and $j \in S_{k+r}$, or $i \in S_{q+1} \cup \cdots \cup S_p$ and $j \in S_{k-q}$.
- $A$ has property $\mathrm{A}_p$ (or is $p$-cyclic) ( $p \geq 2$) if it has property $\mathrm{A}_{1,p-1}$. (Property $\mathrm{A}_2$ is usually called “property $\mathrm{A}$” and is equivalent to the directed graph of $A$ being bipartite, ignoring loops.)
$A$ is $(q, r)$-consistently ordered ( $q, r \geq 1$) if there exists a partition $\set{S_k}_{k=1}^p$ of $\set{1, \dots, n}$ (not necessarily with $p = q + r$) such that if $a_{ij} \neq 0$ and $i \in S_k$, then $i \in S_1 \cup \cdots \cup S_{p-r}$ and $j \in S_{k+r}$ if $i < j$, or $i \in S_{q+1} \cup \cdots \cup S_p$ and $j \in S_{k-q}$ if $i > j$. (A $(1, p-1)$-consistently ordered matrix is sometimes called “consistently ordered $p$-cyclic”.)

If $A$ has property $\mathrm{A}_{q, r}$, then it has a $(q, r)$-ordering vector: a $\gamma \in \Z^n$ such that if $a_{ij} \neq 0$ for some $i \neq j$, then $\gamma_j - \gamma_i = r$ or $\gamma_j - \gamma_i = -q$; and for each $k \in \set{1, \dots, p := q+r}$, there exists an $i$ such that $\gamma_i \equiv k \pmod{p}$. Conversely, if $A$ has such a vector, then $A$ has property $\mathrm{A}_{q, r}$.

If $A$ is $(q, r)$-consistently ordered, then it has a compatible $(q, r)$-ordering vector: a $(q, r)$-ordering vector $\gamma$ such that if $a_{ij} \neq 0$ for some $i \neq j$, then $\gamma_j - \gamma_i = r$ or $\gamma_j - \gamma_i = -q$ according as $i < j$ or $i > j$. Conversely, if $A$ has such a vector, then $A$ is $(q, r)$-consistently ordered.

Proof. If $A$ has property $\mathrm{A}_{q, r}$ and $\set{S_k}_{k=1}^p$ is as in the definition of this property, take $\gamma_i := k$, where $k$ is such that $i \in S_k$. Conversely, if $A$ has a $(q, r)$-ordering vector $\gamma$, take $S_k := \set{i : \gamma_i \equiv k \pmod{p}}$, where $p = q + r$. The same constructions apply in the consistently ordered case. ∎

This proof also shows that we may assume without loss of generality that $p = q + r$ in the definition of a $(q, r)$-consistently ordered matrix. We also note that $A$ has property $\mathrm{A}_{q, r}$ if and only if it can be symmetrically permuted (that is, conjugated by a permutation matrix) to a $(q, r)$-consistently ordered matrix, since the former property is invariant under symmetric permutations.

If $A$ is $(q, r)$-consistently ordered, then $\det(\lambda D + \alpha^{-q} L + \alpha^r U)$ is independent of $\alpha$ (for all $\lambda$).

Proof. Suppose that $\sigma$ is a permutation of $\set{1, \dots, n}$ with $a_{\sigma(i), i} \neq 0$ for all $i$. Then $$ \prod_{i=1}^n (\lambda D + \alpha^{-q} L + \alpha^r U)_{\sigma(i), i} = \lambda^{n-(n_L + n_U)} (\alpha^{-q})^{n_L} (\alpha^r)^{n_U} \prod_{i=1}^n a_{\sigma(i), i}, $$ where $n_L := \# \set{i : \sigma(i) > i}$ and $n_U := \# \set{i : \sigma(i) < i}$. Now if $\gamma$ is a compatible $(q, r)$-ordering vector for $A$, then $\sum_{\sigma(i) > i} \gamma_i - \gamma_{\sigma(i)} = -qn_L$ and $\sum_{\sigma(i) < i} \gamma_i - \gamma_{\sigma(i)} = rn_U$, so $-qn_L + rn_U = \sum_{\sigma(i) \neq i} \gamma_i - \gamma_{\sigma(i)} = \sum_{\sigma(i) \neq i} \gamma_i - \sum_{j \neq \sigma^{-1}(j)} \gamma_j = 0$. Hence any product of the form above – of which the determinant is a sum – is independent of $\alpha$. ∎

Suppose that $A$ is $(q, r)$-consistently ordered and let $p := q + r$. Then $\lambda$ is an eigenvalue of $G_\mathrm{GS}(\omega)$ if and only if $$ (\lambda + \omega - 1)^p = \lambda^{r} \omega^p \mu^p $$ for some eigenvalue $\mu$ of $G_\mathrm{J}$.

Proof. Since $D$ is invertible, $\mu$ is an eigenvalue of $G_\mathrm{J}$ if and only if $D(\mu I - G_\mathrm{J}) = \mu D + L + U$ is singular. Similarly, $\lambda$ is an eigenvalue of $G_\mathrm{GS}(\omega)$ if and only if $(L + \frac{1}{\omega}D)(\lambda I - G_\mathrm{GS}(\omega)) = \frac{\lambda + \omega - 1}{\omega} D + \lambda L + U$ is singular, or equivalently for $\lambda$ nonzero, $\lambda^{-r/p} \cdot \frac{\lambda + \omega - 1}{\omega} D + (\lambda^{-1/p})^{-q} L + (\lambda^{-1/p})^r U$ is singular. Thus, the result follows from the determinantal invariance property above (note that $0$ is an eigenvalue of $G_\mathrm{GS}(\omega)$ if and only if $\omega = 1$, while $0$ is always an eigenvalue of $G_\mathrm{J}$). ∎

In the special case of a $(1, p-1)$-consistently ordered matrix, we obtain the following corollary.

Convergence for consistently ordered matrices

Suppose that $A$ is $(1, p-1)$-consistently ordered. Then $\rho(G_\mathrm{GS}(1)) = \rho(G_\mathrm{J})^p$. In particular, the Gauss–Seidel method converges if and only if the Jacobi method converges.

Furthermore, in this case, if $\mu$ is an eigenvalue of $G_\mathrm{J}$, then so is $\theta \mu$ for each $p$^th root of unity $\theta$, since $\mu D + L + U$ is singular if and only if $\theta(\mu D + \theta^{-1} L + \theta^{p-1} U)$ is.

Optimal Gauss–Seidel relaxation parameter for consistently ordered matrices

Suppose that $A$ is $(1, p-1)$-consistently ordered, that the Jacobi method converges, and that the eigenvalues of $G_\mathrm{J}^p$ are real and nonnegative. Then the $\omega$-Gauss–Seidel method converges for all $\omega \in (0, \frac{p}{p-1})$, and if $\omega_*$ is the unique solution in $(0, \frac{p}{p-1})$ of $$ [\rho(G_\mathrm{J}) \omega_*]^p = \left(\frac{p}{p-1}\right)^p (p-1)(\omega_* - 1), $$ then for all $\omega \neq \omega_*$, we have $$ \rho(G_\mathrm{GS}(\omega_*)) = (p - 1)(\omega_* - 1) < \rho(G_\mathrm{GS}(\omega)). $$

Proof. We sketch the proof for the case $p = 2$; for details and the general case, see Varga (1959).

When $A$ is $(1, 1)$-consistently ordered, the eigenvalues $\lambda$ of $G_\mathrm{GS}(\omega)$ are the zeroes of the equations $(\lambda + \omega - 1)^2 = \lambda \omega^2 \mu^2$ for $\mu$ in the set of eigenvalues of $G_\mathrm{J}$. For a fixed $\mu \geq 0$, these zeroes are the abscissae of the intersection points of the line $y = \frac{\lambda + \omega - 1}{\omega}$ (which has slope $\frac{1}{\omega}$ and passes through $(1, 1)$) and the curve $y = \pm \lambda^{1/2} \mu$ in the $\lambda$- $y$ plane. The larger abscissa is seen to decrease as $\omega$ increases from $0$ until the line and the curve are tangent, beyond which the equation has two conjugate complex zeroes of modulus $\omega - 1$, which increases as $\omega$ increases. The abscissa of the point of tangency is defined by the equations $\frac{\lambda + \omega - 1}{\omega} = \lambda^{1/2} \mu$ and $\frac{1}{\omega} = \frac{1}{2} \lambda^{-1/2} \mu$, and is maximal when $\mu = \rho(G_\mathrm{J})$. Thus, $[\rho(G_\mathrm{J}) \omega_*]^2 = 2^2 \lambda$, where $\lambda = \omega_* - 1 = \rho(G_\mathrm{GS}(\omega_*))$. ∎

In other words, if $\norm{G} < 1$, then $G^k \to 0$ strongly (because $G^k \to 0$ in norm). ↩︎
In other words, $G^k \to 0$ strongly if and only if $G^k \to 0$ in norm (because $G$ is an operator on a finite-dimensional space). ↩︎
This formula remains true if $G$ is a continuous linear operator on a Banach space $X$. Consequently, in this setting, we still have that $G^k \to 0$ in norm if and only if $\rho(G) < 1$. These are in turn equivalent to the invertibility of $I-G$ and the convergence of the fixed-point iteration of $x \mapsto Gx + f$ for all $f \in X$ (to $(I-G)^{-1} f$). ↩︎