The singular value decomposition

$$ \newcommand{\set}[1]{\{ #1 \}} \newcommand{\Set}[1]{\left \{ #1 \right\}} \renewcommand{\emptyset}{\varnothing} \newcommand{\N}{\mathbb{N}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\R}{\mathbb{R}} \newcommand{\Rn}{\mathbb{R}^n} \newcommand{\Rm}{\mathbb{R}^m} \newcommand{\C}{\mathbb{C}} \newcommand{\F}{\mathbb{F}} \newcommand{\abs}[1]{\lvert #1 \rvert} \newcommand{\Abs}[1]{\left\lvert #1 \right\rvert} \newcommand{\inner}[2]{\langle #1, #2 \rangle} \newcommand{\Inner}[2]{\left\langle #1, #2 \right\rangle} \newcommand{\norm}[1]{\lVert #1 \rVert} \newcommand{\Norm}[1]{\left\lVert #1 \right\rVert} \newcommand{\trans}{{\top}} \newcommand{\span}{\mathop{\mathrm{span}}} \newcommand{\im}{\mathop{\mathrm{im}}} \newcommand{\ker}{\mathop{\mathrm{ker}}} \newcommand{\rank}{\mathop{\mathrm{rank}}} \definecolor{cblue}{RGB}{31, 119, 180} \definecolor{corange}{RGB}{255, 127, 14} \definecolor{cgreen}{RGB}{44, 160, 44} \definecolor{cred}{RGB}{214, 39, 40} \definecolor{cpurple}{RGB}{148, 103, 189} \definecolor{cbrown}{RGB}{140, 86, 75} \definecolor{cpink}{RGB}{227, 119, 194} \definecolor{cgrey}{RGB}{127, 127, 127} \definecolor{cyellow}{RGB}{188, 189, 34} \definecolor{cteal}{RGB}{23, 190, 207} $$

Let $A \in \C^{m \times n}$. The singular value decomposition (SVD) is a factorization of $A$ as $U \Sigma V^*$, where $U \in \C^{m \times m}$ and $V \in \C^{n \times n}$ are unitary and $\Sigma \in \R^{m \times n}$ is (rectangular) diagonal with nonnegative entries.1 In other words, $A = \sum_{i=1}^{\min \set{m, n}} \sigma_i u_i v_i^*$, where $u_i$ and $v_i$ are the $i$th columns of $U$ and $V$ and $\sigma_i$ is the $i$th diagonal entry of $\Sigma$. The vectors $u_i$ and $v_i$ are called left and right singular vectors of $A$ and the scalars $\sigma_i$ are called singular values of $A$; by convention, we arrange the singular values in decreasing order.

If an SVD of $A$ has $r$ nonzero singular values, then $\set{u_i}_{i=1}^r$ is an orthonormal basis of $\im(A)$ because $Av_i = \sigma_i u_i$ for all $i$. Hence $r$ must be the rank of $A$ and $\set{u_i}_{i=r+1}^m$ an orthonormal basis of $\ker(A^*)$; similarly, $\set{v_i}_{i=1}^r$ and $\set{v_i}_{i=r+1}^n$ are orthonormal bases of $\im(A^*)$ and $\ker(A)$.

Existence

Assume without loss of generality that $m \geq n$. Clearly, the matrix $A^* A$ is (Hermitian) positive semidefinite, so by the spectral theorem, $A^* A = V \Lambda V^*$ for some unitary $V \in \C^{n \times n}$ and some diagonal $\Lambda \in \R^{n \times n}$ with diagonal entries $\lambda_1 \geq \cdots \geq \lambda_n \geq 0$. Set $\sigma_i = \sqrt{\lambda_i}$ for each $i$ and $Av_i = \sigma_i u_i$ for each nonzero $\sigma_i$. If $r$ is as above, $\hat{U} := \begin{bmatrix} u_1 & \cdots & u_r \end{bmatrix} \in \C^{m \times r}$, and $\hat{\Sigma} := \mathrm{diag}(\sigma_1, \dots, \sigma_r) \in \R^{r \times r}$, then by construction $$ AV = \hat{U} \begin{bmatrix} \hat{\Sigma} & 0_{r \times (n-r)} \end{bmatrix}. $$ Moreover, $\inner{u_i}{u_j} = \inner{Av_i / \sigma_i}{Av_j / \sigma_j} = \inner{\lambda_i v_i}{v_j} / \sigma_i \sigma_j = \delta_{ij}$, so $\set{u_i}_{i=1}^r$ is orthonormal. Extending this set to an orthonormal basis $\set{u_i}_{i=1}^m$, and defining $U = \begin{bmatrix} u_1 & \cdots & u_m \end{bmatrix} \in \C^{m \times m}$ and $$ \Sigma = \begin{bmatrix} \hat{\Sigma} & 0_{r \times (n-r)} \\ 0_{(m-r) \times r} & 0_{(m-r) \times (n-r)}\end{bmatrix} \in \R^{m \times n}, $$ we obtain $A = U \Sigma V^*$ as required.2

Although an SVD is not unique, this argument shows that the singular values are unique and that the singular vectors are unique up to complex signs if $m = n$ and the singular values are distinct, since we must have $A^* A = V (\Sigma^* \Sigma) V^*$.

Low-rank approximation

Eckart–Young theorem

Suppose that $U \Sigma V^*$ is an SVD of a matrix $A \in \C^{m \times n}$ with rank $r$. If $k \leq r$ and $A_k := \sum_{i=1}^k \sigma_i u_i v_i^*$, then $\norm{A - B}_2 \geq \sigma_{k+1} = \norm{A - A_k}_2$ for all $B \in \C^{m \times n}$ such that $\rank(B) \leq k$ (where $\sigma_{r+1} := 0$). In particular, $\norm{A}_2 = \sigma_1$.

Proof. Suppose that $B \in \C^{m \times n}$ is such that $\rank(B) \leq k$. Then $\dim(\ker(B)) \geq n-k$, so there exists a $v \in \ker(B) \cap \span \set{v_i}_{i=1}^{k+1}$ such that $\norm{v}_2 = 1$. Hence $\norm{A - B}_2^2 \geq \norm{(A-B)v}_2^2 = \norm{Av}_2^2 = \sum_{i=1}^{k+1} \sigma_i^2 \abs{\inner{v}{v_i}}^2 \geq \sigma_{k+1}^2$. Similarly, if $v \in \C^n$ with $\norm{v}_2 = 1$, then $\norm{(A-A_k) v}_2^2 = \sum_{i=k+1}^r \sigma_i^2 \abs{\inner{v}{v_i}}^2 \leq \sigma_{k+1}^2$, with equality if $v = v_{k+1}$. ∎

An analogous theorem holds for the Frobenius norm (which can be proven similarly). In fact, we have the following generalization.

Eckart–Young–Mirsky theorem

Suppose that $U \Sigma V^*$ is an SVD of a matrix $A \in \C^{m \times n}$ with rank $r$ and let $\norm{{}\cdot{}}$ be a unitarily invariant norm. If $k \leq r$ and $A_k := \sum_{i=1}^k \sigma_i u_i v_i^*$, then $\norm{A - B} \geq \norm{A - A_k}$ for all $B \in \C^{m \times n}$ such that $\rank(B) \leq k$.

Proof. We begin by proving Weyl’s inequality for singular values: $$ \sigma_{i+j-1}(A+B) \leq \sigma_i(A) + \sigma_j(B), $$ where $\sigma_i({}\cdot{})$ denotes the $i$th singular value of a given matrix. Let $A_k := \sum_{i=1}^k \sigma_i(A) u_i v_i^*$. Then $\rank(A_{i-1} + B_{j-1}) \leq (i-1) + (j-1) = i+j-2$, so by the Eckart–Young theorem, $$ \begin{align*} \sigma_{i+j-1}(A+B) &\leq \norm{(A + B) - (A_{i-1} + B_{j-1})}_2 \\ &\leq \norm{A - A_{i-1}}_2 + \norm{B - B_{j-1}}_2 \\ &= \sigma_i(A) + \sigma_j(B). \end{align*} $$ Now suppose that $B \in \C^{m \times n}$ is such that $\rank(B) \leq k$. By Weyl’s inequality, $\sigma_{k+i}(A) \leq \sigma_{k+1}(B) + \sigma_i(A-B) = \sigma_i(A-B)$ for all $i$ (where $\sigma_{k+i}({}\cdot{}) := 0$ if $k + i \gt \min \set{m, n}$), so there exist $\theta_i \in [0, 1]$ such that $\sigma_{k+i}(A) = \theta_i \sigma_i(A-B)$. For $0 \leq j \leq \min \set{m, n}$, let $D_j^\pm \in \R^{m \times n}$ be the (rectangular) diagonal matrix with diagonal entries $$ (D_j^\pm)_{ii} := \begin{cases} \sigma_i(A-B) & \text{if $i \lt j$}, \\ \pm \sigma_i(A-B) & \text{if $i = j$}, \\ \sigma_{k+i}(A) & \text{if $i \gt j$}. \end{cases} $$ Then $\norm{A - A_k} = \norm{D_0^+}$ and $\norm{A - B} = \norm{D_{\min \set{m, n}}^+}$ since $\norm{{}\cdot{}}$ is unitarily invariant. Moreover, $\norm{D_j^+} = \norm{D_j^-}$ is increasing in $j$ because $\norm{D_{j-1}^+} = \norm{\frac{1 + \theta_j}{2} D_j^+ + \frac{1 - \theta_j}{2} D_j^-} \leq \frac{1 + \theta_j}{2} \norm{D_j^+} + \frac{1 - \theta_j}{2} \norm{D_j^-} = \norm{D_j^+}$, so $\norm{A - A_k} \leq \norm{A - B}$. ∎


  1. If $A \in \R^{m \times n}$, an SVD is defined analogously; i.e., with $U$ and $V$ orthogonal. ↩︎

  2. If we instead add $n-r$ rows of zeroes to $\begin{bmatrix} \hat{\Sigma} & 0 \end{bmatrix}$, forming a square matrix, the resulting decomposition is sometimes called the thin SVD; if we instead omit the last $n-r$ columns, what remains is sometimes called the compact SVD↩︎

Previous
Next