Least-squares problems arise in a number of ways. The simplest one is this:
Application 1. ``Linear regression''. Find the best-fitting straight line
for data points
(Figure
).
Solution. Here we need to find
and
, not
and
. If a data point
(say) were
to be on the line that would mean
. But there
are too many data points to hope that they would all be on one line.
Thus we have to settle for finding
and
so that
or in other words,
,
so the normal equations are
, or
.
A computer solution gives
,
, to five decimal places. Thus the best-fitting
straight line in the least-squares sense is
.
In general, for data points
you get the normal equations
.
It is all right to have some of the
be the same. These normal
equations are guaranteed to be nonsingular as long as not all the
are the same. (If they were all the same, the data points would lie on a
vertical line and there would be no one best-fitting linear function.)
Application 2. Finding a best-fitting quadratic. Given
data points
,
,
,
find the parabola
that fits
them best (Figure
). In other words, solve
At first this might not look like a linear problem. Notice, though,
that the
and
are data numbers, and their powers
are also just certain numbers. The unknowns are
, which appear linearly. In fact, a polynomial is a
linear combination--of powers. In matrix form, this problem is
, or for short,
c
s.
Solving the normal equations gives us the best
.
Note. Although one can talk loosely of a ``best-fitting parabola'',
if
happened to come out to be zero the curve would actually
be a straight line.
Application 3. Finding a best-fitting linear combination of
polynomials of given degree
.
This is an obvious generalization of the quadratic case.
For polynomials of degree
, the only requirement to guarantee nonsingularity
of the normal equations is that
should include
at least
different numbers.
Application 4. Finding a best-fitting linear combination of functions. Given data points and a specified finite family of functions, find the linear combination of those functions that fits the data best.
In other words, suppose we are given functions
and data points
.
We want to find coefficients
so that
.
This is just a least-squares problem
c
b, where
and
b
. We'll discuss
the particular case where the
are Chebyshev polynomials.
Application 5. Finding a best-fitting polynomial parametric curve near given data points.
Let's consider a cubic curve in two dimensions. We have times
and data points
,
and we want a cubic curve
so that
is
close to
for each
. As a measure of closeness
we can use the Euclidean distance. In other words, we want to
minimize the sum of squares of the errors
dist
. This situation at first sounds different from the
errors previously discussed, which were vertical distances on
graphs, but see what happens: Write
,
. Then
dist
, and the
sum-of-squares error is the sum of squares of square roots:
. So if we write
and
, with coefficients to be
determined, the sum-of-squares error is exactly the same as for
the least-squares problem
Although squared errors from both
- and
-coordinates
are added together, the errors involve separate sets of variables,
so this problem is really equivalent to two separate least-squares
problems
c
x and
d
y, where
,
c
,
d
,
x
,
y
.
Since both problems share the same
, we can solve them together
with normal equations
, where
and
= the matrix of data points as row vectors.
Other applications. Large overdetermined arrays arise in many
engineering design applications. The method of normal equations works
well well even if
is
. (On modern workstations
it takes less than a second to solve a
set of linear
equations.)