Recall that the distance between two vectors is the length of
their difference. Therefore
is the distance from
x to
b. Moreover,
x
. For a nonnegative
function, minimizing the square of the function is the same as
minimizing the function. Therefore
Moreover, the vectors of the form
x, for all possible
column vectors
x, form a subspace
of
R
. Therefore
the problem is really one of finding the point in a subspace that is
closest to a given point; the error vector
is the vector
from the given point to the point in the subspace. In Figure
,
the parallelogram represents part of the subspace
.
The error vector ``err'' goes from
to
x.
To see why
is a subspace, notice that
is just the image
of the linear transformation given by
x
x.
Or, notice that
x can be expanded as
x
, where
means the
-th column
of
; therefore
is the column space of
, i.e., the
subspace spanned by the columns of
. In Figure
, the
columns of
are the sides of the parallelogram that touch the origin.
You might think that the way to get from
b to the closest
point of
is to go along a normal to
--i.e., a line perpendicular
to
--and that's exactly right, as indicated by the right angle in
Figure
. First let's discuss the computational
method that produces the answer, and then then derivation of the method.