The singular value decomposition

Let $A \in \C^{m \times n}$. The singular value decomposition (SVD) is a factorization of $A$ as $U \Sigma V^*$, where $U \in \C^{m \times m}$ and $V \in \C^{n \times n}$ are unitary and $\Sigma \in \R^{m \times n}$ is (rectangular) diagonal with nonnegative entries. ¹ In other words, $A = \sum_{i=1}^{\min \set{m, n}} \sigma_i u_i v_i^*$, where $u_i$ and $v_i$ are the $i$^th columns of $U$ and $V$ and $\sigma_i$ is the $i$^th diagonal entry of $\Sigma$. The vectors $u_i$ and $v_i$ are called left and right singular vectors of $A$ and the scalars $\sigma_i$ are called singular values of $A$; by convention, we arrange the singular values in decreasing order.

If an SVD of $A$ has $r$ nonzero singular values, then $\set{u_i}_{i=1}^r$ is an orthonormal basis of $\im(A)$ because $Av_i = \sigma_i u_i$ for all $i$. Hence $r$ must be the rank of $A$ and $\set{u_i}_{i=r+1}^m$ an orthonormal basis of $\ker(A^*)$; similarly, $\set{v_i}_{i=1}^r$ and $\set{v_i}_{i=r+1}^n$ are orthonormal bases of $\im(A^*)$ and $\ker(A)$.

Existence

Assume without loss of generality that $m \geq n$. Clearly, the matrix $A^* A$ is (Hermitian) positive semidefinite, so by the spectral theorem, $A^* A = V \Lambda V^*$ for some unitary $V \in \C^{n \times n}$ and some diagonal $\Lambda \in \R^{n \times n}$ with diagonal entries $\lambda_1 \geq \cdots \geq \lambda_n \geq 0$. Set $\sigma_i = \sqrt{\lambda_i}$ for each $i$ and $Av_i = \sigma_i u_i$ for each nonzero $\sigma_i$. If $r$ is as above, $\hat{U} := \begin{bmatrix} u_1 & \cdots & u_r \end{bmatrix} \in \C^{m \times r}$, and $\hat{\Sigma} := \mathrm{diag}(\sigma_1, \dots, \sigma_r) \in \R^{r \times r}$, then by construction $$ AV = \hat{U} \begin{bmatrix} \hat{\Sigma} & 0_{r \times (n-r)} \end{bmatrix}. $$ Moreover, $\inner{u_i}{u_j} = \inner{Av_i / \sigma_i}{Av_j / \sigma_j} = \inner{\lambda_i v_i}{v_j} / \sigma_i \sigma_j = \delta_{ij}$, so $\set{u_i}_{i=1}^r$ is orthonormal. Extending this set to an orthonormal basis $\set{u_i}_{i=1}^m$, and defining $U = \begin{bmatrix} u_1 & \cdots & u_m \end{bmatrix} \in \C^{m \times m}$ and $$ \Sigma = \begin{bmatrix} \hat{\Sigma} & 0_{r \times (n-r)} \\ 0_{(m-r) \times r} & 0_{(m-r) \times (n-r)}\end{bmatrix} \in \R^{m \times n}, $$ we obtain $A = U \Sigma V^*$ as required. ²

Although an SVD is not unique, this argument shows that the singular values are unique and that the singular vectors are unique up to complex signs if $m = n$ and the singular values are distinct, since we must have $A^* A = V (\Sigma^* \Sigma) V^*$.

Low-rank approximation

Eckart–Young theorem

Suppose that $U \Sigma V^*$ is an SVD of a matrix $A \in \C^{m \times n}$ with rank $r$. If $k \leq r$ and $A_k := \sum_{i=1}^k \sigma_i u_i v_i^*$, then $\norm{A - B}_2 \geq \sigma_{k+1} = \norm{A - A_k}_2$ for all $B \in \C^{m \times n}$ such that $\rank(B) \leq k$ (where $\sigma_{r+1} := 0$). In particular, $\norm{A}_2 = \sigma_1$.

Proof. Suppose that $B \in \C^{m \times n}$ is such that $\rank(B) \leq k$. Then $\dim(\ker(B)) \geq n-k$, so there exists a $v \in \ker(B) \cap \span \set{v_i}_{i=1}^{k+1}$ such that $\norm{v}_2 = 1$. Hence $\norm{A - B}_2^2 \geq \norm{(A-B)v}_2^2 = \norm{Av}_2^2 = \sum_{i=1}^{k+1} \sigma_i^2 \abs{\inner{v}{v_i}}^2 \geq \sigma_{k+1}^2$. Similarly, if $v \in \C^n$ with $\norm{v}_2 = 1$, then $\norm{(A-A_k) v}_2^2 = \sum_{i=k+1}^r \sigma_i^2 \abs{\inner{v}{v_i}}^2 \leq \sigma_{k+1}^2$, with equality if $v = v_{k+1}$. ∎

An analogous theorem holds for the Frobenius norm (which can be proven similarly). In fact, we have the following generalization.

Eckart–Young–Mirsky theorem

Suppose that $U \Sigma V^*$ is an SVD of a matrix $A \in \C^{m \times n}$ with rank $r$ and let $\norm{{}\cdot{}}$ be a unitarily invariant norm. If $k \leq r$ and $A_k := \sum_{i=1}^k \sigma_i u_i v_i^*$, then $\norm{A - B} \geq \norm{A - A_k}$ for all $B \in \C^{m \times n}$ such that $\rank(B) \leq k$.

Proof. We begin by proving Weyl’s inequality for singular values: $$ \sigma_{i+j-1}(A+B) \leq \sigma_i(A) + \sigma_j(B), $$ where $\sigma_i({}\cdot{})$ denotes the $i$^th singular value of a given matrix. Let $A_k := \sum_{i=1}^k \sigma_i(A) u_i v_i^*$. Then $\rank(A_{i-1} + B_{j-1}) \leq (i-1) + (j-1) = i+j-2$, so by the Eckart–Young theorem, $\sigma_{i+j-1}(A+B) \leq \norm{(A + B) - (A_{i-1} + B_{j-1})}_2 \leq \norm{A - A_{i-1}}_2 + \norm{B - B_{j-1}}_2 = \sigma_i(A) + \sigma_j(B)$.

Now suppose that $B \in \C^{m \times n}$ is such that $\rank(B) \leq k$. By Weyl’s inequality, $\sigma_{k+i}(A) \leq \sigma_{k+1}(B) + \sigma_i(A-B) = \sigma_i(A-B)$ for all $i$ (where $\sigma_{k+i}({}\cdot{}) := 0$ if $k + i > \min \set{m, n}$). Thus, by unitary invariance, it suffices to show that $\Phi(x) := \norm{\operatorname{diag}(x_1, x_2, \dots)} \leq \Phi(y)$ whenever $0 \leq x \leq y$ componentwise; by induction and permutation invariance, we may assume without loss of generality that $x_i = y_i$ for all $i > 1$. ³ Accordingly, let $\theta \in [0, 1]$ be such that $x_1 = \theta y_1$. Then $\Phi(x) = \Phi(\frac{1+\theta}{2} y_1 + \frac{1-\theta}{2}(-y_1), y_2, \dots) \leq \frac{1+\theta}{2} \Phi(y_1, y_2, \dots) + \frac{1-\theta}{2} \Phi(-y_1, y_2, \dots) = \Phi(y)$, as was to be shown. ∎

If $A \in \R^{m \times n}$, an SVD is defined analogously; i.e., with $U$ and $V$ orthogonal. ↩︎
If we instead add $n-r$ rows of zeroes to $\begin{bmatrix} \hat{\Sigma} & 0 \end{bmatrix}$, forming a square matrix, the resulting decomposition is sometimes called the thin SVD; if we instead omit the last $n-r$ columns, what remains is sometimes called the compact SVD. ↩︎
Clearly, if $A \mapsto \Phi(\sigma_1(A), \sigma_2(A), \dots)$ is a unitarily invariant norm, then $\Phi$ is a symmetric gauge function: a norm on a real coordinate space that is permutation invariant and absolute ( $\Phi(\abs{x}) = \Phi(x)$ for all $x$). The converse is a theorem of von Neumann. ↩︎