The conjugate gradient method is an iterative method for solving the linear system $Ax = b$, where $A \in \mathbb{R}^{n \times n}$ is symmetric positive-definite.
Let $\inner{x}{y}_A := \inner{Ax}{y}$ be the inner product defined by $A$ and $\norm{x}_A := \sqrt{\inner{x}{x}_A}$ be the induced norm. Given an initial guess $x^{(0)}$ for the solution $x^*$, the $k$th iterate of the method is defined as $$ x^{(k)} = \mathop{\mathrm{arg\,min}}_{x \in x^{(0)} + \K_k(A, r^{(0)})}\ \norm{x^* - x}_A\,, $$ where $\mathcal{K}_k(A, r^{(0)})$ is the Krylov subspace $\span \set{A^j r^{(0)}}_{j = 0}^{k-1}$ and $r^{(0)} = b - Ax^{(0)}$. (In other words, the $A$-norm of the error is minimized over the $k$th affine Krylov subspace generated by the initial residual and translated by the initial guess.)1
Let us abbreviate $\K_k(A, r^{(0)})$ as $\K_k$ and write $r^{(k)} = b - Ax^{(k)}$ for the residual of the $k$th iterate. The iterate $x^{(k)}$ is therefore the $A$-orthogonal projection of $x^*$ onto $x^{(0)} + \K_k$, defined by the Galerkin conditions $x^{(k)} - x^{(0)} \in \K_k$ and $x^* - x^{(k)} \perp_A \K_k$; we note that the orthogonality condition is equivalent to $r^{(k)} \perp \K_k$.
Now suppose that $\set{p^{(j)}}_{j < k}$ is a basis of $\K_k$ and let $P_k = \begin{bmatrix} p^{(0)} & \cdots & p^{(k-1)} \end{bmatrix}$. Then $x^{(k)} = x^{(0)} + P_k y^{(k)}$, where $$ y^{(k)} = \mathop{\mathrm{arg\,min}}_{y \in \R^k}\ \norm{x^* - (x^{(0)} + P_k y)}_A\,. $$ If $p^{(k)}$ is such that $\set{p^{(j)}}_{j < k+1}$ is a basis of $\K_{k+1}$, we can express the next iterate $x^{(k+1)}$ in an analogous manner – that is, $x^{(k+1)} = x^{(0)} + P_{k+1} y^{(k+1)}$, where $P_{k+1} = \begin{bmatrix} P_k & p^{(k)} \end{bmatrix}$. Writing $y^{(k+1)} = \begin{bmatrix} \tilde{y}^{(k)} \\ \alpha_k \end{bmatrix}$ for some $\tilde{y}^{(k)} \in \R^k$ and $\alpha_k \in \R$, we see that $$ \begin{align*} x^* - (x^{(0)} + P_{k+1} y^{(k+1)}) &= [x^{(k)} - (x^{(0)} + P_k \tilde{y}^{(k)})] + [(x^* - x^{(k)}) - \alpha_k p^{(k)}] \\ &= P_k(y^{(k)} - \tilde{y}^{(k)}) + [(x^* - x^{(k)}) - \alpha_k p^{(k)}]\,. \end{align*} $$ Thus, if we select $p^{(k)}$ to be $A$-orthogonal to $p^{(j)}$ for all $j < k$, then by the Pythagorean theorem, $$ \begin{align*} \norm{x^* - (x^{(0)} + P_{k+1} y^{(k+1)})}_A^2 &= \norm{P_k(y^{(k)} - \tilde{y}^{(k)})}_A^2 + \norm{(x^* - x^{(k)}) - \alpha_k p^{(k)}}_A^2\,, \end{align*} $$ so the solution to the least squares problem for $y^{(k+1)}$ is given recursively by $\tilde{y}^{(k)} = y^{(k)}$ and $\alpha_k p^{(k)} = \mathrm{proj}^A_{p^{(k)}}(x^* - x^{(k)})$. It follows that $$ \begin{align*} x^{(k+1)} % &= x^{(0)} + P_{k+1} y^{(k+1)} \\ &= x^{(0)} + P_k y^{(k)} + \alpha_k p^{(k)} \\ &= x^{(k)} + \alpha_k p^{(k)}, \label{X}\tag{X} \end{align*} $$ where $$ \begin{align*} \alpha_k &= \frac{\inner{x^* - x^{(k)}}{p^{(k)}}_A}{\inner{p^{(k)}}{p^{(k)}}_A} \\ &= \frac{\inner{r^{(k)}}{p^{(k)}}}{\inner{p^{(k)}}{p^{(k)}}_A}\,. \label{Alpha}\tag{ $\Alpha$} \end{align*} $$ This also implies that $$ \begin{equation} r^{(k+1)} = r^{(k)} - \alpha_k Ap^{(k)}. \label{R}\tag{R} \end{equation} $$ To generate $A$-orthogonal vectors $p^{(j)}$ such that $\set{p^{(j)}}_{j < k}$ is a basis of $\K_k$ for each $k$, we notice that $r^{(k+1)} \perp_A \K_k = \span \set{p^{(j)}}_{j < k}$ because $r^{(k+1)} \perp \K_{k+1}$ and $A\K_k \subseteq \K_{k+1}$. As a result, when $r^{(k+1)}$ is $A$-orthogonalized against $p^{(k)}$, the resulting vector will automatically be $A$-orthogonal to $p^{(j)}$ for all $j < k+1$, suggesting that we define $$ \begin{align*} p^{(k+1)} &= r^{(k+1)} - \mathrm{proj}^A_{p^{(k)}} r^{(k+1)} \\ &= r^{(k+1)} + \beta_k p^{(k)}, \label{P}\tag{P} \end{align*} $$ where $p^{(0)} = r^{(0)}$ and $$ \begin{equation} \beta_k = -\frac{\inner{r^{(k+1)}}{p^{(k)}}_A}{\inner{p^{(k)}}{p^{(k)}}_A}\,. \label{Beta}\tag{ $\Beta$} \end{equation} $$
Referring back to the residual equation ( $\ref{R}$), we can show by induction that the $p^{(j)}$ thus defined will also constitute bases of successive Krylov subspaces. More precisely, suppose that the solution has not been found by the beginning of the $k$th iteration, in the sense that $r^{(j)} \neq 0$ for all $j < k$. We claim then that $r^{(k-1)} \in \K_k$ and that $\set{p^{(j)}}_{j < k}$ is an $A$-orthogonal basis of $\K_k$.
Indeed, if $r^{(0)} \neq 0$, then $r^{(0)} \in \K_1 = \span \set{r^{(0)}}$ and $\set{p^{(0)}} = \set{r^{(0)}}$ is an $A$-orthogonal basis of $\K_1$. Now suppose that the claim holds up to the $k$th iteration and that its hypothesis is satisfied at the beginning of the $(k+1)$th iteration. Then $$ r^{(k)} = r^{(k-1)} - \alpha_{k-1} Ap^{(k-1)} \in \K_k + A\K_k \subseteq \K_{k+1}\,, $$ so $$ p^{(k)} = r^{(k)} + \beta_{k-1} p^{(k-1)} \in \K_{k+1} + \K_k \subseteq \K_{k+1}\,. $$ In addition, $p^{(k)} \neq 0$ because $r^{(k)} \perp \K_k$ and $r^{(k)} \neq 0$. Hence, by construction, $\set{p^{(j)}}_{j < k+1}$ is an $A$-orthogonal set of nonzero vectors in $\K_{k+1}$ and is moreover a basis thereof, since $\dim(\K_{k+1}) \leq k + 1$.
An immediate consequence is that $\set{r^{(j)}}_{j < k}$ will be an orthogonal basis of $\K_k$ for all such iterations: if (say) $i < j < k$, then $r^{(i)} \in \K_{i+1} \subseteq \K_j$, and we know that $r^{(j)} \perp \K_j$. Furthermore, the iteration will break down exactly when $r^{(k)} \in \K_k$, or equivalently, $r^{(k)} = 0$, meaning that the solution was attained in the $k$th iteration.
We can also derive alternative formulas for the scalars $\alpha_k$ and $\beta_k$ that reduce the number of inner products in each iteration. First, using the fact that $r^{(k)} \perp \K_k = \span \set{p^{(j)}}_{j < k}$, we obtain $$ \begin{align*} \alpha_k &= \frac{\inner{r^{(k)}}{p^{(k)}}}{\inner{p^{(k)}}{p^{(k)}}_A} \\ &= \frac{\inner{r^{(k)}}{r^{(k)} + \beta_{k-1} p^{(k-1)}}}{\inner{p^{(k)}}{p^{(k)}}_A} \\ &= \frac{\inner{r^{(k)}}{r^{(k)}}}{\inner{p^{(k)}}{p^{(k)}}_A}\,. \label{Alpha2}\tag{ $\Alpha$} \end{align*} $$ Hence $$ \begin{align*} \beta_k &= -\frac{\inner{r^{(k+1)}}{p^{(k)}}_A}{\inner{p^{(k)}}{p^{(k)}}_A} \\ &= -\alpha_k\frac{\inner{r^{(k+1)}}{p^{(k)}}_A}{\inner{r^{(k)}}{r^{(k)}}} \\ &= \frac{\inner{r^{(k+1)}}{r^{(k+1)} - r^{(k)}}}{\inner{r^{(k)}}{r^{(k)}}} \\ &= \frac{\inner{r^{(k+1)}}{r^{(k+1)}}}{\inner{r^{(k)}}{r^{(k)}}}\,. \label{Beta2}\tag{ $\Beta$} \end{align*} $$
In summary,
$$ \begin{align*} &r^{(0)} &&= b - Ax^{(0)} \\ &p^{(0)} &&= r^{(0)} \\ \\ &\alpha_k &&= \frac{\inner{r^{(k)}}{r^{(k)}}}{\inner{p^{(k)}}{p^{(k)}}_A} && \ref{Alpha2} \\ &x^{(k+1)} &&= x^{(k)} + \alpha_k p^{(k)} && \ref{X} \\ &r^{(k+1)} &&= r^{(k)} - \alpha_k Ap^{(k)} && \ref{R} \\ &\beta_k &&= \frac{\inner{r^{(k+1)}}{r^{(k+1)}}}{\inner{r^{(k)}}{r^{(k)}}} && \ref{Beta2} \\ &p^{(k+1)} &&= r^{(k+1)} + \beta_k p^{(k)} && \ref{P} \end{align*} $$
-
The choice of this minimization problem can be partially motivated as follows. In view of the fact that $x^* = x^{(0)} + A^{-1} r^{(0)}$ and that $A^{-1}$ is a polynomial in $A$ of degree at most $n-1$, in the $k$th iteration of the method, we seek an approximation to the solution of the form $x^{(0)} + p_{k-1}(A) r^{(0)}$, where $p_{k-1}$ is a polynomial of degree at most $k-1$. This guarantees that the $A$-norm of the error decreases monotonically and that the solution is found in at most $n$ iterations (in exact arithmetic). Although the choice of the objective function is not canonical, it turns out that this choice leads to a particularly tractable method. ↩︎