w_lstsqs

Next: w_lstsqs Up: w_lstsqs Previous: w_lstsqs

4. Applications.

Least-squares problems arise in a number of ways. The simplest one is this:

Application 1. ``Linear regression''. Find the best-fitting straight line for data points (Figure ).

**Figure:** A linear fit
book/12dir/linear.eps

Solution. Here we need to find and , not and . If a data point (say) were to be on the line that would mean . But there are too many data points to hope that they would all be on one line. Thus we have to settle for finding and so that

$\begin{displaymath}\begin{array}{ccc} c _ 0 + 1 c _ 1 &\approx &3 \\ c _ 0 + 2... ... c _ 1 &\approx &1 \\ c _ 0 + 2 c _ 1 &\approx &2 \end{array}\end{displaymath}$

or in other words, $\left[\begin{array}{rr} 1&1\\ 1&2\\ 1&-2\\ 1&-1\\ 1&2 \end{array}\right] \... ...right] \approx \left[\begin{array}{c} 3\\ 3\\ -1\\ 1\\ 2 \end{array}\right]$ , so the normal equations are

$\left[\begin{array}{rrrrr}1&1&1&1&1\\ 1&2&-2&-1&2\end{array}\right] \left[\be... ...array}\right] \left[\begin{array}{c} 3\\ 3\\ -1\\ 1\\ 2 \end{array}\right]$ , or

$\left[\begin{array}{rr} 5&2\\ 2&14 \end{array}\right] = \left[\begin{array}{c} 8\\ 14 \end{array}\right]$ . A computer solution gives , , to five decimal places. Thus the best-fitting straight line in the least-squares sense is .

In general, for data points $(r _ 1, s _ 1),\dots, (r _ m, s _ m)$ you get the normal equations

$\left[\begin{array}{cc} m& \sum _ i r _ i\\ \sum _ i r _ i&\sum _ i r _ i ^ 2... ...left[\begin{array}{c} \sum _ i s _ i\\ \sum _ i r _ i s _ i\end{array}\right]$ .

It is all right to have some of the be the same. These normal equations are guaranteed to be nonsingular as long as not all the are the same. (If they were all the same, the data points would lie on a vertical line and there would be no one best-fitting linear function.)

Application 2. Finding a best-fitting quadratic. Given data points , $\dots$ , , find the parabola that fits them best (Figure ). In other words, solve

$\begin{displaymath}\begin{array}{ccc} c _ 0 + c _ 1 r _ 1 + c _ 2 r _ 1 ^ 2 & \a... ...0 + c _ 1 r _ m + c _ 2 r _ m ^ 2 & \approx & s _ m \end{array}\end{displaymath}$

**Figure:** A quadratic fit
book/12dir/quadratic.eps

At first this might not look like a linear problem. Notice, though, that the and are data numbers, and their powers are also just certain numbers. The unknowns are , which appear linearly. In fact, a polynomial is a linear combination--of powers. In matrix form, this problem is

$\left[\begin{array}{ccc} 1& r _ 1& r _ 1 ^ 2\\ 1& r _ 2& r _ 2 ^ 2\\ \dots &... ...t] = \left[\begin{array}{c} s _ 1\\ s _ 2\\ \dots \\ s _ m\end{array}\right]$ , or for short, c $\approx$ s.

Solving the normal equations gives us the best .

Note. Although one can talk loosely of a ``best-fitting parabola'', if happened to come out to be zero the curve would actually be a straight line.

Application 3. Finding a best-fitting linear combination of polynomials of given degree .

This is an obvious generalization of the quadratic case. For polynomials of degree , the only requirement to guarantee nonsingularity of the normal equations is that $r _ 1,\dots, r _ m$ should include at least different numbers.

Application 4. Finding a best-fitting linear combination of functions. Given data points and a specified finite family of functions, find the linear combination of those functions that fits the data best.

In other words, suppose we are given functions $f _ 1,\dots f _ n$ and data points $(r _ 1, s _ 1),\dots, (r _ m, s _ m)$ . We want to find coefficients $c _ 1,\dots, c _ n$ so that

$\begin{displaymath}\begin{array}{ccc} c _ 1 f _ 1 (r _ 1) + \dots + c _ n f _ n... ... m) + \dots + c _ n f _ n (r _ m)&\approx&s _ m\\ \end{array}\end{displaymath}$ .

This is just a least-squares problem c $\approx$ b, where and b. We'll discuss the particular case where the are Chebyshev polynomials.

Application 5. Finding a best-fitting polynomial parametric curve near given data points.

Let's consider a cubic curve in two dimensions. We have times $t _ 0,\dots, t _ n$ and data points $P _ 1,\dots, P _ n$ , and we want a cubic curve so that is close to for each . As a measure of closeness we can use the Euclidean distance. In other words, we want to minimize the sum of squares of the errors dist. This situation at first sounds different from the errors previously discussed, which were vertical distances on graphs, but see what happens: Write $P(t) = \left[\begin{array}{c}x(t)\\ y(t)\end{array}\right]$ , $P _ i = \left[\begin{array}{c}x _ i\\ y _ i\end{array}\right]$ . Then dist $\sqrt {(x(t _ i) - x _ i) ^ 2 + (y(t _ i) - y _ i) ^ 2}$ , and the sum-of-squares error is the sum of squares of square roots: $E = \sum _ i (x(t _ i) - x _ i) ^ 2 + (y(t _ i) - y _ i) ^ 2$ . So if we write and , with coefficients to be determined, the sum-of-squares error is exactly the same as for the least-squares problem

$\begin{displaymath}\begin{array}{ccc} c _ 0 + c _ 1 t _ 1 + c _ 2 t _ 1 ^ 2 + c ... ...2 t _ n ^ 2 + d _ 3 t _ n ^ 3 & \approx & y _ n\\ \end{array}\end{displaymath}$

Although squared errors from both - and -coordinates are added together, the errors involve separate sets of variables, so this problem is really equivalent to two separate least-squares problems c $\approx$ x and d $\approx$ y, where

$A = \left[\begin{array}{cccc} 1& t _ 1& t _ 1 ^ 2& t _ 1 ^ 3\\ 1& t _ 2& t _... ...s }&{\dots }&{\dots }\\ 1& t _ n& t _ n ^ 2& t _ n ^ 3\\ \end{array}\right]$ ,

c $= \left[\begin{array}{c}c _ 0\\ c _ 1\\ c _ 2\\ c _ 3\end{array}\right]$ , d $= \left[\begin{array}{c}d _ 0\\ d _ 1\\ d _ 2\\ d _ 3\end{array}\right]$ , x $= \left[\begin{array}{c}x _ 1\\ x _ 2\\ \dots \\ x _ n\end{array}\right]$ , y $= \left[\begin{array}{c}y _ 1\\ y _ 2\\ \dots \\ x _ n\end{array}\right]$ .

Since both problems share the same , we can solve them together with normal equations

, where $C = \left[\begin{array}{cc}c _ 0& d _ 0\\ c _ 1& d _ 1\\ c _ 2& d _ 2\\ c _ 3& d _ 3\end{array}\right]$ and

$P = \left[\begin{array}{cc}x _ 1& y _ 1\\ x _ 2& y _ 2\\ {\dots }&{\dots }\\ x _ n& y _ n\end{array}\right]$ = the matrix of data points as row vectors.

Other applications. Large overdetermined arrays arise in many engineering design applications. The method of normal equations works well well even if is $1000 \times 50$ . (On modern workstations it takes less than a second to solve a $50 \times 50$ set of linear equations.)

Next: w_lstsqs Up: w_lstsqs Previous: w_lstsqs

Kirby A. Baker 2003-05-13