Projections
Let $H$ be a Hilbert space and $Y \subseteq H$. The (orthogonal) projection operator onto $Y$ is defined for $x \in H$ by $$ \proj{Y}(x) := \underset{y \in Y}{\operatorname{argmin}} \frac{1}{2} \norm{y - x}^2. $$
Hilbert projection theorem (first projection theorem)
If $Y$ is nonempty, closed, and convex, then $\proj{Y}(x)$ is a singleton (so $\proj{Y} : H \to Y$ is well-defined).
Proof. Let $(y_n)_{n=1}^\infty \subseteq Y$ be such that $d_n := \frac{1}{2} \norm{y_n - x}^2 \to d := \inf_{y \in Y} \frac{1}{2} \norm{y-x}^2$. By the parallelogram identity, $$ \Norm{\frac{y_m + y_n}{2} - x}^2 + \Norm{\frac{y_m - y_n}{2}}^2 = 2\Norm{\frac{y_m - x}{2}}^2 + 2\Norm{\frac{y_n - x}{2}}^2 = d_m + d_n, $$ where $\norm{\frac{y_m + y_n}{2} - x}^2 \geq 2d$ by convexity. Taking $m, n \to \infty$ shows that $(y_n)$ is Cauchy and therefore convergent to some $y \in Y$ with $\frac{1}{2} \norm{y - x}^2 = d$. Moreover, if $y’ \in Y$ is another minimizer, replacing $y_m, y_n$ by $y, y'$ above shows that $y = y’$. ∎
Recall that the polar cone of $Y$ is $Y^\circ := \set{x \in H : \forall y \in Y \, (\Re(\inner{x}{y}) \leq 0)}$ and that the orthogonal complement of $Y$ is $Y^\perp := \set{x \in H : \forall y \in Y \, (\inner{x}{y} = 0)}$; clearly, if $Y$ is a subspace of $H$, then $Y^\circ = Y^\perp$.
Characterization of projections (second projection theorem)
If $Y$ is nonempty, closed, and convex, then $y = \proj{Y}(x)$ if and only if $y \in Y$ and $x-y \in (Y-y)^\circ$.
Proof. If $y = \proj{Y}(x)$ and $y’ \in Y$, then for all $\lambda \in [0, 1]$, we have $$ \norm{y-x}^2 \leq \norm{(1-\lambda)y + \lambda y' - x}^2 = \norm{y-x}^2 + 2\lambda \Re(\inner{y-x}{y'-y}) + \lambda^2 \norm{y' - y}^2, $$ so $\Re(\inner{y-x}{y'-y}) \geq 0$. Conversely, if $y, y' \in Y$ and $x-y \in (Y-y)^\circ$, then setting $\lambda = 1$ in the inequality above shows that $y = \proj{Y}(x)$. ∎
Firm nonexpansiveness of the projection operator
If $Y$ is nonempty, closed, and convex, then $$ \norm{\proj{Y}(x) - \proj{Y}(x’)}^2 + \norm{(I-\proj{Y})(x) + (I-\proj{Y})(x’)}^2 \leq \norm{x-x’}^2. $$
Proof. Let $y = \proj{Y}(x)$ and $y’ = \proj{Y}(x’)$, and add the inequalities $\Re(\inner{x'-y'}{y-y'}) \leq 0$ and $\Re(\inner{x-y}{y'-y}) \leq 0$. ∎
In particular, this implies that the projection operator is nonexpansive: $\norm{\proj{Y}(x) - \proj{Y}(x')} \leq \norm{x-x'}$.
If $Y$ is a closed subspace of $H$, it follows from the above that $y = \proj{Y}(x)$ if and only if $y \in Y$ and $x-y \in Y^\perp$, and that $\proj{Y} : H \to Y$ is a linear operator with $\norm{\proj{Y}} \leq 1$, $\im(\proj{Y}) = Y$, and $\ker(\proj{Y}) = Y^\perp$. In addition, $\proj{Y^\perp} = I-\proj{Y}$.
Least squares problems
Let $H_1$ and $H_2$ be Hilbert spaces and suppose that $A : H_1 \to H_2$ is a continuous linear operator with closed image.1 The (linear) least squares problem is that of finding an $x \in H_1$ that minimizes $\frac{1}{2} \norm{b - Ax}^2$ for a given $b \in H_2$, or equivalently, that satisfies $Ax = \proj{\im(A)} b$. Using the fact that $\im(A)^\perp = \ker(A^*)$, we can also write this as the normal equation $A^*Ax = A^*b$.
The pseudoinverse
To solve the least squares problem, we observe that $A\restriction_{\ker(A)^\perp} : \ker(A)^\perp \to \im(A)$ is bijective since $Ax = Ax’$ implies that $x-x’ \in \ker(A)$ and $y = Ax$ implies that $y = A (x - \proj{\ker(A)} x)$. Thus, the pseudoinverse $A^+ : H_2 \to H_1$ of $A$, defined as $$ A^+ := A\restriction_{\ker(A)^\perp}^{-1} \circ \proj{\im(A)}, $$ is a well-defined continuous linear operator, and by construction $x^* := A^+ b$ is a solution to the least squares problem.
This solution need not be unique; however, it is the unique solution of minimal norm because $x - x^* \in \ker(A)$ for any solution $x$, so $\norm{x}^2 = \norm{x-x^*}^2 + \norm{x^*}^2 \geq \norm{x^*}^2$ with equality if and only if $x = x^*$.
It is straightforward to verify that:
- $A^+ = A^{-1}$ if $A$ is bijective
- $\im(A^+) = \ker(A)^\perp$, $\ker(A^+) = \im(A)^\perp$
- $AA^+ = \proj{\im(A)}$, $A^+A = \proj{\im(A^+)}$ (and in fact, these characterize the pseudoinverse)
- $(A^+)^+ = A$
- $(A^*)^+ = (A^+)^*$
- $A^+ = (A^* A)^+ A^* = A^* (AA^*)^+$
In the finite-dimensional case, if $A \in \C^{m \times n}$ has full column rank, then $A^+ = (A^* A)^{-1} A^*$ by the identities above; similarly, if it has full row rank, then $A^+ = A^* (AA^*)^{-1}$. More generally, if $\hat{U} \hat{\Sigma} \hat{V}^*$ is a compact SVD of $A$ (that is, $\hat{\Sigma}$ is $r \times r$, where $r = \rank(A)$), then $A^+ = \hat{V} \hat{\Sigma}^{-1} \hat{U}^*$.
-
Note that this implies that $A^*$ also has closed image, so $\im(A)^\perp = \ker(A^*)$ and $\ker(A)^\perp = \overline{\im(A^*)} = \im(A^*)$. ↩︎