next up previous
Next: w_lstsqs Up: w_lstsqs Previous: w_lstsqs

4. Applications.

Least-squares problems arise in a number of ways. The simplest one is this:



Application 1. ``Linear regression''. Find the best-fitting straight line $ y = c _ 0 + c _ 1 x$ for data points $ (1,3), (2,3), (-2,-1), (-1,1), (2,2)$ (Figure [*]).

Figure: A linear fit
book/12dir/linear.eps

Solution. Here we need to find $ c _ 0$ and $ c _
1$, not $ x$ and $ y$. If a data point $ (2,5)$ (say) were to be on the line that would mean $ 5 = c _ 0 + c _ 1 2$. But there are too many data points to hope that they would all be on one line. Thus we have to settle for finding $ c _ 0$ and $ c _
1$ so that

\begin{displaymath}\begin{array}{ccc}
c _ 0 + 1 c _ 1 &\approx &3 \\
c _ 0 + 2...
... c _ 1 &\approx &1 \\
c _ 0 + 2 c _ 1 &\approx &2
\end{array}\end{displaymath}

or in other words, $ \left[\begin{array}{rr} 1&1\\  1&2\\  1&-2\\  1&-1\\  1&2 \end{array}\right] \...
...right]
\approx \left[\begin{array}{c} 3\\  3\\  -1\\  1\\  2 \end{array}\right]$, so the normal equations are

$ \left[\begin{array}{rrrrr}1&1&1&1&1\\  1&2&-2&-1&2\end{array}\right] \left[\be...
...array}\right] \left[\begin{array}{c} 3\\  3\\
-1\\  1\\  2 \end{array}\right]$, or

$ \left[\begin{array}{rr} 5&2\\  2&14 \end{array}\right] = \left[\begin{array}{c} 8\\  14 \end{array}\right]$. A computer solution gives $ c _ 0 = 1.27273$, $ c _ 1 =
0.81818$, to five decimal places. Thus the best-fitting straight line in the least-squares sense is $ y = 1.27273 +
0.81818 x$.



In general, for data points $ (r _ 1, s _ 1),\dots, (r _ m, s _ m)$ you get the normal equations



$ \left[\begin{array}{cc} m& \sum _ i r _ i\\  \sum _ i r _ i&\sum _ i r _ i ^ 2...
...left[\begin{array}{c}
\sum _ i s _ i\\  \sum _ i r _ i s _ i\end{array}\right] $.



It is all right to have some of the $ r _ i$ be the same. These normal equations are guaranteed to be nonsingular as long as not all the $ r _ i$ are the same. (If they were all the same, the data points would lie on a vertical line and there would be no one best-fitting linear function.)



Application 2. Finding a best-fitting quadratic. Given data points $ (r _ 1, s _ 1)$, $ \dots $, $ (r _ m, s _ m)$, find the parabola $ y = c _ 0 + c _ 1 x + c _ 2 x ^ 2$ that fits them best (Figure [*]). In other words, solve

\begin{displaymath}\begin{array}{ccc}
c _ 0 + c _ 1 r _ 1 + c _ 2 r _ 1 ^ 2 & \a...
...0 + c _ 1 r _ m + c _ 2 r _ m ^ 2 & \approx & s _ m
\end{array}\end{displaymath}

Figure: A quadratic fit
book/12dir/quadratic.eps

At first this might not look like a linear problem. Notice, though, that the $ r _ i$ and $ s _ i$ are data numbers, and their powers are also just certain numbers. The unknowns are $ c _ 0, c _ 1,
c _ 2$, which appear linearly. In fact, a polynomial is a linear combination--of powers. In matrix form, this problem is

$ \left[\begin{array}{ccc}
1& r _ 1& r _ 1 ^ 2\\  1& r _ 2& r _ 2 ^ 2\\  \dots &...
...t] =
\left[\begin{array}{c} s _ 1\\  s _ 2\\  \dots \\  s _ m\end{array}\right]$, or for short, $ A$   c$ \approx$   s.

Solving the normal equations gives us the best $ c _ 0, c _ 1,
c _ 2$.

Note. Although one can talk loosely of a ``best-fitting parabola'', if $ c _ 2$ happened to come out to be zero the curve would actually be a straight line.



Application 3. Finding a best-fitting linear combination of polynomials of given degree $ d$.

This is an obvious generalization of the quadratic case. For polynomials of degree $ d$, the only requirement to guarantee nonsingularity of the normal equations is that $ r _ 1,\dots, r _ m$ should include at least $ d+1$ different numbers.



Application 4. Finding a best-fitting linear combination of functions. Given data points and a specified finite family of functions, find the linear combination of those functions that fits the data best.

In other words, suppose we are given functions $ f _ 1,\dots f _ n$ and data points $ (r _ 1, s _ 1),\dots, (r _ m, s _ m)$. We want to find coefficients $ c _ 1,\dots, c _ n$ so that

\begin{displaymath}\begin{array}{ccc}
c _ 1 f _ 1 (r _ 1) + \dots + c _ n f _ n...
... m) + \dots + c _ n f _ n (r _ m)&\approx&s _ m\\
\end{array}\end{displaymath}.

This is just a least-squares problem $ A$   c$ \approx$   b, where $ A = [f _ i (r _ j)]$ and b$ = [s _ i]$. We'll discuss the particular case where the $ f _ i$ are Chebyshev polynomials.



Application 5. Finding a best-fitting polynomial parametric curve near given data points.

Let's consider a cubic curve in two dimensions. We have times $ t _ 0,\dots, t _ n$ and data points $ P _ 1,\dots, P _ n$, and we want a cubic curve $ P(t)$ so that $ P(t _ i)$ is close to $ P _ i$ for each $ i$. As a measure of closeness we can use the Euclidean distance. In other words, we want to minimize the sum of squares of the errors dist$ (P(t _ i), P
_ i)$. This situation at first sounds different from the errors previously discussed, which were vertical distances on graphs, but see what happens: Write $ P(t) = \left[\begin{array}{c}x(t)\\
y(t)\end{array}\right]$, $ P _ i = \left[\begin{array}{c}x _ i\\  y _ i\end{array}\right]$. Then dist$ (P(t _ i), P
_ i)$ $ = $ $ \sqrt {(x(t _ i) - x
_ i) ^ 2 + (y(t _ i) - y _ i) ^ 2}$, and the sum-of-squares error is the sum of squares of square roots: $ E
= \sum _ i (x(t _ i) - x _ i) ^ 2 + (y(t _ i) - y _
i) ^ 2$. So if we write $ x(t) = c _ 0 + c _ 1 t + c _
2 t ^ 2 + c _ 3 t ^ 3$ and $ y(t) = d _ 0 + d _ 1 t +
d _ 2 t ^ 2 + d _ 3 t ^ 3$, with coefficients to be determined, the sum-of-squares error is exactly the same as for the least-squares problem

\begin{displaymath}\begin{array}{ccc}
c _ 0 + c _ 1 t _ 1 + c _ 2 t _ 1 ^ 2 + c ...
...2 t _ n ^ 2 + d _ 3
t _ n ^ 3 & \approx & y _ n\\
\end{array}\end{displaymath}

Although squared errors from both $ x$- and $ y$-coordinates are added together, the errors involve separate sets of variables, so this problem is really equivalent to two separate least-squares problems $ A$   c$ \approx$   x and $ A$   d$ \approx$   y, where

$ A = \left[\begin{array}{cccc}
1& t _ 1& t _ 1 ^ 2& t _ 1 ^ 3\\
1& t _ 2& t _...
...s }&{\dots }&{\dots }\\
1& t _ n& t _ n ^ 2& t _ n ^ 3\\
\end{array}\right]$,

c$ = \left[\begin{array}{c}c _ 0\\  c _ 1\\  c _ 2\\  c _ 3\end{array}\right]$, d$ = \left[\begin{array}{c}d _ 0\\  d _ 1\\  d _ 2\\  d _ 3\end{array}\right]$, x$ = \left[\begin{array}{c}x _ 1\\  x _ 2\\  \dots \\  x _ n\end{array}\right]$, y$ = \left[\begin{array}{c}y _ 1\\  y _ 2\\  \dots \\  x _ n\end{array}\right]$.

Since both problems share the same $ A$, we can solve them together with normal equations

$ A ^ t A C = A ^ t P$, where $ C = \left[\begin{array}{cc}c _ 0& d _ 0\\
c _ 1& d _ 1\\  c _ 2& d _ 2\\  c _ 3& d _ 3\end{array}\right]$ and

$ P = \left[\begin{array}{cc}x _ 1& y _ 1\\  x _ 2& y _ 2\\  {\dots }&{\dots }\\
x _ n& y _ n\end{array}\right]$ = the matrix of data points as row vectors.



Other applications. Large overdetermined arrays arise in many engineering design applications. The method of normal equations works well well even if $ A$ is $ 1000 \times 50$. (On modern workstations it takes less than a second to solve a $ 50 \times 50$ set of linear equations.)




next up previous
Next: w_lstsqs Up: w_lstsqs Previous: w_lstsqs
Kirby A. Baker 2003-05-13