To be more concrete, we should ask for
and
that
make the equations as nearly true as possible. In other words,
we should try to find
and
that minimize the errors
There is still a problem, though: It is not clear what it means to minimize several errors simultaneously. We must choose some single combined measure of the errors and then minimize that.
To combine the errors we should not use
, because negative
errors could cancel out positive ones. It is more reasonable,
but still not good, to use
. The use of absolute values
is undesirable both because the absolute value function is not
differentiable and because there might be many
pairs
giving the same minimum value (as it turns out).
Instead, a nice measure of the combined error is the sum
of squares of the individual errors. Because the combined error
depends on the choice of
and
, let's write
.
Just trying some different choices for
and
gives
(since
),
,
(worse); but the least-squares method will reveal
(fantastic, and the lowest possible error).
It is worth noting that the least squares method will allow several
small errors in preference to one large one; for example,
is smaller than
.