Projections and least squares problems

$$ \newcommand{\set}[1]{\{ #1 \}} \newcommand{\Set}[1]{\left \{ #1 \right\}} \renewcommand{\emptyset}{\varnothing} \newcommand{\N}{\mathbb{N}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\R}{\mathbb{R}} \newcommand{\Rn}{\mathbb{R}^n} \newcommand{\Rm}{\mathbb{R}^m} \newcommand{\C}{\mathbb{C}} \newcommand{\F}{\mathbb{F}} \newcommand{\abs}[1]{\lvert #1 \rvert} \newcommand{\Abs}[1]{\left\lvert #1 \right\rvert} \newcommand{\inner}[2]{\langle #1, #2 \rangle} \newcommand{\Inner}[2]{\left\langle #1, #2 \right\rangle} \newcommand{\norm}[1]{\lVert #1 \rVert} \newcommand{\Norm}[1]{\left\lVert #1 \right\rVert} \newcommand{\tp}{{\top}} \newcommand{\trans}{{\top}} \newcommand{\span}{\operatorname{span}} \newcommand{\im}{\operatorname{im}} \newcommand{\ker}{\operatorname{ker}} \newcommand{\rank}{\operatorname{rank}} \newcommand{\proj}{\operatorname{proj}} \newcommand{\proj}[1]{\mathop{\mathrm{proj}_{#1}}} \newcommand{\K}{\mathcal{K}} \newcommand{\L}{\mathcal{L}} \renewcommand{\epsilon}{\varepsilon} \definecolor{cblue}{RGB}{31, 119, 180} \definecolor{corange}{RGB}{255, 127, 14} \definecolor{cgreen}{RGB}{44, 160, 44} \definecolor{cred}{RGB}{214, 39, 40} \definecolor{cpurple}{RGB}{148, 103, 189} \definecolor{cbrown}{RGB}{140, 86, 75} \definecolor{cpink}{RGB}{227, 119, 194} \definecolor{cgrey}{RGB}{127, 127, 127} \definecolor{cyellow}{RGB}{188, 189, 34} \definecolor{cteal}{RGB}{23, 190, 207} $$

Projections

Let $H$ be a Hilbert space and $Y \subseteq H$. The (orthogonal) projection operator onto $Y$ is defined for $x \in H$ by $$ \proj{Y}(x) := \underset{y \in Y}{\operatorname{argmin}} \frac{1}{2} \norm{y - x}^2. $$

Hilbert projection theorem (first projection theorem)

If $Y$ is nonempty, closed, and convex, then $\proj{Y}(x)$ is a singleton (so $\proj{Y} : H \to Y$ is well-defined).

Proof. Let $(y_n)_{n=1}^\infty \subseteq Y$ be such that $d_n := \frac{1}{2} \norm{y_n - x}^2 \to d := \inf_{y \in Y} \frac{1}{2} \norm{y-x}^2$. By the parallelogram identity, $$ \Norm{\frac{y_m + y_n}{2} - x}^2 + \Norm{\frac{y_m - y_n}{2}}^2 = 2\Norm{\frac{y_m - x}{2}}^2 + 2\Norm{\frac{y_n - x}{2}}^2 = d_m + d_n, $$ where $\norm{\frac{y_m + y_n}{2} - x}^2 \geq 2d$ by convexity. Taking $m, n \to \infty$ shows that $(y_n)$ is Cauchy and therefore convergent to some $y \in Y$ with $\frac{1}{2} \norm{y - x}^2 = d$. Moreover, if $y’ \in Y$ is another minimizer, replacing $y_m, y_n$ by $y, y'$ above shows that $y = y’$. ∎

Recall that the polar cone of $Y$ is $Y^\circ := \set{x \in H : \forall y \in Y \, (\Re(\inner{x}{y}) \leq 0)}$ and that the orthogonal complement of $Y$ is $Y^\perp := \set{x \in H : \forall y \in Y \, (\inner{x}{y} = 0)}$; clearly, if $Y$ is a subspace of $H$, then $Y^\circ = Y^\perp$.

Characterization of projections (second projection theorem)

If $Y$ is nonempty, closed, and convex, then $y = \proj{Y}(x)$ if and only if $y \in Y$ and $x-y \in (Y-y)^\circ$.

Proof. If $y = \proj{Y}(x)$ and $y’ \in Y$, then for all $\lambda \in [0, 1]$, we have $$ \norm{y-x}^2 \leq \norm{(1-\lambda)y + \lambda y' - x}^2 = \norm{y-x}^2 + 2\lambda \Re(\inner{y-x}{y'-y}) + \lambda^2 \norm{y' - y}^2, $$ so $\Re(\inner{y-x}{y'-y}) \geq 0$. Conversely, if $y, y' \in Y$ and $x-y \in (Y-y)^\circ$, then setting $\lambda = 1$ in the inequality above shows that $y = \proj{Y}(x)$. ∎

Firm nonexpansiveness of the projection operator

If $Y$ is nonempty, closed, and convex, then $$ \norm{\proj{Y}(x) - \proj{Y}(x’)}^2 + \norm{(I-\proj{Y})(x) + (I-\proj{Y})(x’)}^2 \leq \norm{x-x’}^2. $$

Proof. Let $y = \proj{Y}(x)$ and $y’ = \proj{Y}(x’)$, and add the inequalities $\Re(\inner{x'-y'}{y-y'}) \leq 0$ and $\Re(\inner{x-y}{y'-y}) \leq 0$. ∎

In particular, this implies that the projection operator is nonexpansive: $\norm{\proj{Y}(x) - \proj{Y}(x')} \leq \norm{x-x'}$.

If $Y$ is a closed subspace of $H$, it follows from the above that $y = \proj{Y}(x)$ if and only if $y \in Y$ and $x-y \in Y^\perp$, and that $\proj{Y} : H \to Y$ is a linear operator with $\norm{\proj{Y}} \leq 1$, $\im(\proj{Y}) = Y$, and $\ker(\proj{Y}) = Y^\perp$. In addition, $\proj{Y^\perp} = I-\proj{Y}$.

Least squares problems

Let $H_1$ and $H_2$ be Hilbert spaces and suppose that $A : H_1 \to H_2$ is a continuous linear operator with closed image.1 The (linear) least squares problem is that of finding an $x \in H_1$ that minimizes $\frac{1}{2} \norm{b - Ax}^2$ for a given $b \in H_2$, or equivalently, that satisfies $Ax = \proj{\im(A)} b$. Using the fact that $\im(A)^\perp = \ker(A^*)$, we can also write this as the normal equation $A^*Ax = A^*b$.

The pseudoinverse

To solve the least squares problem, we observe that $A\restriction_{\ker(A)^\perp} : \ker(A)^\perp \to \im(A)$ is bijective since $Ax = Ax’$ implies that $x-x’ \in \ker(A)$ and $y = Ax$ implies that $y = A (x - \proj{\ker(A)} x)$. Thus, the pseudoinverse $A^+ : H_2 \to H_1$ of $A$, defined as $$ A^+ := A\restriction_{\ker(A)^\perp}^{-1} \circ \proj{\im(A)}, $$ is a well-defined continuous linear operator, and by construction $x^* := A^+ b$ is a solution to the least squares problem.

This solution need not be unique; however, it is the unique solution of minimal norm because $x - x^* \in \ker(A)$ for any solution $x$, so $\norm{x}^2 = \norm{x-x^*}^2 + \norm{x^*}^2 \geq \norm{x^*}^2$ with equality if and only if $x = x^*$.

It is straightforward to verify that:

  • $A^+ = A^{-1}$ if $A$ is bijective
  • $\im(A^+) = \ker(A)^\perp$, $\ker(A^+) = \im(A)^\perp$
  • $AA^+ = \proj{\im(A)}$, $A^+A = \proj{\im(A^+)}$ (and in fact, these characterize the pseudoinverse)
  • $(A^+)^+ = A$
  • $(A^*)^+ = (A^+)^*$
  • $A^+ = (A^* A)^+ A^* = A^* (AA^*)^+$

In the finite-dimensional case, if $A \in \C^{m \times n}$ has full column rank, then $A^+ = (A^* A)^{-1} A^*$ by the identities above; similarly, if it has full row rank, then $A^+ = A^* (AA^*)^{-1}$. More generally, if $\hat{U} \hat{\Sigma} \hat{V}^*$ is a compact SVD of $A$ (that is, $\hat{\Sigma}$ is $r \times r$, where $r = \rank(A)$), then $A^+ = \hat{V} \hat{\Sigma}^{-1} \hat{U}^*$.


  1. Note that this implies that $A^*$ also has closed image, so $\im(A)^\perp = \ker(A^*)$ and $\ker(A)^\perp = \overline{\im(A^*)} = \im(A^*)$. ↩︎

Previous
Next