One way to invent the method of normal equations is to use calculus.
The error is
. Take the partial derivative with
respect to each
and set it equal to zero. You get
simultaneous equations in
, which turn out to
be the same as the normal equations.
Here is a better way, based on the geometrical interpretation discussed above. Recall that
(i) the set
of all vectors of the form
x is a subspace, namely the column space of
(the subspace
of
R
spanned by the columns of
);
(ii) we are looking for the particular
x for which
x is closest to
b. To avoid confusion, call this vector
x
;
(iii) the error vector is
x
b.
We need only two more facts:
(iv) The shortest distance from a vector b to a subspace is along a perpendicular. (See Problems for a verification.)
(v) In a matrix product
, the entries are dot products of rows
of
with columns of
.
Now, by (iv),
is perpendicular to all vectors in
.
By (i), the columns of
are in
, so
is perpendicular
to each column of
. Thus the dot product of
with
every column of
is zero. Write the dot products by putting
on its side and writing the matrix product
0.
This says
x
b
, so
x
b, the normal equations!