The conjugate gradient method is an iterative method for solving the linear system $Ax = b$, where $A \in \mathbb{R}^{n \times n}$ is symmetric positive-definite.
Let $\inner{x}{y}_A = \inner{Ax}{y}$ be the inner product defined by $A$ and $\norm{x}_A = \sqrt{\inner{x}{x}_A}$ be the induced norm. Given an initial guess $x^{(0)}$ for the solution $x^*$, the $k$th iterate of the method is selected such that $$ x^{(k)} = \mathop{\mathrm{arg\,min}}_{x \in x^{(0)} + \K_k(A, r^{(0)})} \norm{x^* - x}_A\,, $$ where $\mathcal{K}_k(A, r^{(0)})$ is the Krylov subspace $\span \set{A^j r^{(0)}}_{j = 0}^{k-1}$ and $r^{(0)} = b - Ax^{(0)}$. (In other words, the $A$-norm of the error is minimized over the $k$th affine Krylov subspace generated by the initial residual and translated by the initial guess.) 1
Let us abbreviate $\K_k(A, r^{(0)})$ as $\K_k$ and write $r^{(k)} = b - Ax^{(k)}$ for the residual of the $k$th iterate. The iterate $x^{(k)}$ is therefore the $A$-orthogonal projection of $x^*$ onto $x^{(0)} + \K_k$, defined by the Galerkin conditions $x^{(k)} - x^{(0)} \in \K_k$ and $x^* - x^{(k)} \perp_A \K_k$. In particular, the orthogonality condition implies that $$ \begin{align} r^{(k)} &\perp \K_k\,, \\ r^{(k)} &\perp_A \K_{k-1}\,, \end{align} $$ where the latter holds since $A\K_{k-1} \subseteq \K_k$.
If $\set{p^{(j)}}_{j < k}$ is a basis for $\K_k$ and $P_k = \begin{bmatrix} p^{(0)} & \cdots & p^{(k-1)} \end{bmatrix}$, we have $$ x^{(k)} = x^{(0)} + P_k(P_k^\trans A P_k)^{-1} P_k^\trans r^{(0)}. $$ If, in addition, the $p^{(j)}$ are chosen to be $A$-orthogonal, the matrix $P_k^\trans A P_k$ (the Gram matrix of the $p^{(j)}$ with respect to the $A$-inner product) becomes a diagonal matrix $D_k$, and a particularly simple recurrence can be found for the $x^{(k)}$. Namely, $$ \begin{align*} x^{(k+1)} &= x^{(0)} + P_{k+1}^\vphantom{-1}D_{k+1}^{-1} P_{k+1}^{\trans\vphantom{-1}} r^{(0)} \\ &= x^{(0)} + \begin{bmatrix} P_k & p^{(k)} \end{bmatrix} \begin{bmatrix}D_k & \\ & \inner{p^{(k)}}{p^{(k)}}_A\end{bmatrix}^{-1} \begin{bmatrix} P_k^\trans \\ (p^{(k)})^\trans \end{bmatrix} r^{(0)} \\ &= x^{(0)} + P_{k}^\vphantom{-1}D_{k}^{-1} P_{k}^{\trans\vphantom{-1}} r^{(0)} + p^{(k)} \inner{p^{(k)}}{p^{(k)}}_A^{-1} (p^{(k)})^{\trans\vphantom{-1}} r^{(0)} \\ &= x^{(k)} + \alpha_k p^{(k)}, \label{X}\tag{X} \end{align*} $$ where $$ \begin{equation} \alpha_k = \frac{\inner{r^{(0)}}{p^{(k)}}}{\inner{p^{(k)}}{p^{(k)}}_A}\,. \label{Alpha}\tag{$\Alpha$} \end{equation} $$ This also implies that $$ \begin{equation} r^{(k+1)} = r^{(k)} - \alpha_k Ap^{(k)}. \label{R}\tag{R} \end{equation} $$ In order to generate such a basis, we employ Gram–Schmidt orthogonalization (with respect to the $A$-inner product). However, instead of orthogonalizing the vectors $A^j r^{(0)}$, we will orthogonalize the successive residuals $r^{(j)}$. We claim that the resulting vectors $p^{(j)}$ will still constitute bases of the Krylov subspaces. 2
More precisely, suppose that the solution has not been found by the end of the $k$th iteration, in the sense that $r^{(j)} \neq 0$ for all $j \leq k$, and that $p^{(j)}$ was generated by orthogonalizing $r^{(j)}$ against $p^{(i)}$ for each $i < j \leq k$. We claim then that $\set{r^{(j)}}_{j \leq k}$ is a basis for $\K_{k+1}$ (and hence that $\set{p^{(j)}}_{j \leq k}$ is an $A$-orthogonal basis for $\K_{k+1}$).
Indeed, if $r^{(0)} \neq 0$, then $\set{r^{(0)}}$ is a basis for $\K_1 = \span \set{r^{(0)}}$. Now suppose that the claim holds for the $k$th iteration and that its hypotheses are satisfied after the $(k+1)$th iteration. Then $$ \begin{align*} \span \set{r^{(j)}}_{j \leq k+1} &= \span \set{r^{(j)}}_{j \leq k} + \span \set{r^{(k+1)}} \\ &= \K_{k+1} + \span \set{r^{(k+1)}} \end{align*} $$ and $r^{(k+1)} \notin \K_{k+1}$ since $r^{(k+1)} \perp \K_{k+1}$ and $r^{(k+1)} \neq 0$. Hence $\dim(\span \set{r^{(j)}}_{j \leq k+1}) \geq \dim(\K_{k+1}) + 1$. On the other hand, $$ \begin{align*} \K_{k+1} + \span \set{r^{(k+1)}} &= \K_{k+1} + \span \set{r^{(k)} - \alpha_k Ap^{(k)}} \\ &\subseteq \K_{k+1} + (\K_{k+1} + A\K_{k+1}) \\ &\subseteq \K_{k+2}, \end{align*} $$ where $\dim(\K_{k+2}) \leq \dim(\K_{k+1}) + 1$, so all the subspaces (and dimensions) above must be equal.
An immediate consequence of this choice is that the residuals will be orthogonal: if (say) $i < j$, then $r^{(i)} \in \K_{i+1} \subseteq \K_j$, and we know that $r^{(j)} \perp \K_j$.
It remains to derive a recurrence for the $p^{(j)}$. In view of the fact that $r^{(k+1)} \perp_A \K_{k} = \span \set{p^{(j)}}_{j < k}$, we find that $$ \begin{align*} p^{(k+1)} &= r^{(k+1)} - \sum_{j < k+1} (\mathrm{proj}^A_{p^{(j)}} r^{(k+1)}) p^{(j)} \\ &= r^{(k+1)} - (\mathrm{proj}^A_{p^{(k)}} r^{(k+1)}) p^{(k)} \\ &= r^{(k+1)} + \beta_k p^{(k)}, \label{P}\tag{P} \end{align*} $$ where $$ \begin{equation} \beta_k = -\frac{\inner{r^{(k+1)}}{p^{(k)}}_A}{\inner{p^{(k)}}{p^{(k)}}_A}\,. \label{Beta}\tag{$\Beta$} \end{equation} $$ This completes the mathematical derivation of the method.
The formulas for the scalars $\alpha_k$ and $\beta_k$ can be further simplified (computation-wise): $$ \begin{align*} \alpha_k &= \frac{\inner{r^{(0)}}{p^{(k)}}}{\inner{p^{(k)}}{p^{(k)}}_A} \\ &= \frac{\inner{r^{(k)}}{p^{(k)}} + \inner{r^{(0)} - r^{(k)}}{p^{(k)}}}{\inner{p^{(k)}}{p^{(k)}}_A} \\ &= \frac{\inner{r^{(k)}}{p^{(k)}} + \inner{x^{(k)} - x^{(0)}}{p^{(k)}}_A}{\inner{p^{(k)}}{p^{(k)}}_A} \\ &= \frac{\inner{r^{(k)}}{p^{(k)}}}{\inner{p^{(k)}}{p^{(k)}}_A} \end{align*} $$ since $x^{(k)} - x^{(0)} \in \K_k$ and $p^{(k)} \perp_A \span \set{p^{(j)}}_{j < k} = \K_k$ by definition. Moreover, $\inner{r^{(k)}}{p^{(k)}} = \inner{r^{(k)}}{r^{(k)} + \beta_{k-1} p^{(k-1)}} = \inner{r^{(k)}}{r^{(k)}}$ as $r^{(k)} \perp \K_k$, so $$ \begin{equation} \alpha_k = \frac{\inner{r^{(k)}}{r^{(k)}}}{\inner{p^{(k)}}{p^{(k)}}_A}\,. \label{Alpha2}\tag{$\Alpha$} \end{equation} $$ Finally, $$ \begin{align*} \beta_k &= -\frac{\inner{r^{(k+1)}}{p^{(k)}}_A}{\inner{p^{(k)}}{p^{(k)}}_A} \\ &= -\alpha_k\frac{\inner{r^{(k+1)}}{p^{(k)}}_A}{\inner{r^{(k)}}{r^{(k)}}} \\ &= \frac{\inner{r^{(k+1)}}{r^{(k+1)} - r^{(k)}}}{\inner{r^{(k)}}{r^{(k)}}} \\ &= \frac{\inner{r^{(k+1)}}{r^{(k+1)}}}{\inner{r^{(k)}}{r^{(k)}}}\,. \label{Beta2}\tag{$\Beta$} \end{align*} $$ In summary,
$$ \begin{align*} &r^{(0)} &&= b - Ax^{(0)} \\ &p^{(0)} &&= r^{(0)} \\ \\ &\alpha_k &&= \frac{\inner{r^{(k)}}{r^{(k)}}}{\inner{p^{(k)}}{p^{(k)}}_A} && \ref{Alpha2} \\ &x^{(k+1)} &&= x^{(k)} + \alpha_k p^{(k)} && \ref{X} \\ &r^{(k+1)} &&= r^{(k)} - \alpha_k Ap^{(k)} && \ref{R} \\ &\beta_k &&= \frac{\inner{r^{(k+1)}}{r^{(k+1)}}}{\inner{r^{(k)}}{r^{(k)}}} && \ref{Beta2} \\ &p^{(k+1)} &&= r^{(k+1)} + \beta_k p^{(k)} && \ref{P} \end{align*} $$
-
The choice of this minimization problem can be partially motivated as follows. In view of the fact that $x^* = x^{(0)} + A^{-1} r^{(0)}$ and that $A^{-1}$ is a polynomial in $A$ of degree at most $n-1$, in the $k$th iteration of the method, we seek an approximation to the solution of the form $x^{(0)} + p_{k-1}(A) r^{(0)}$, where $p_{k-1}$ is a polynomial of degree at most $k-1$. This guarantees that the $A$-norm of the error decreases monotonically and that the solution is found in at most $n$ iterations (in exact arithmetic). Although the choice of the objective function is not canonical, it turns out that this choice leads to a particularly tractable method. ↩︎
-
The computation of the residual in each iteration furnishes a useful measure of progress towards the solution (namely, the norm of the residual) at the cost of one matrix-vector multiplication. At the same time, since $r^{(k+1)} = r^{(k)} - \alpha_k A p^{(k)}$, it is inductively plausible that the sets $\set{r^{(j)}}_{j < k}$ would constitute bases of successive Krylov subspaces and that the $p^{(j)}$ could be generated ‘online’ from the $r^{(j)}$, so to speak. Importantly, no other matrix-vector multiplications would be needed in each iteration. ↩︎