2.
What is the best fit?
What criterion
can we use to decide if one line is a better fit than another?
In approximate models,
the line of best fit does not necessarily pass through any of the
points in a scatterplot. Nevertheless, we would like the line to
pass as close as possible to all the points! That means that we
would want the sum of the errors of all the points to be as small as possible.
Error
(also called the residue) is defined
as the difference between the real observed value
y
of a point at x,
and the value predicted by the model at that point,
i.e. y'
= ax
+ b. In the applet below, d3 = y
– y'
= 2 – (2a + b). So the size of the error depends on
the values of a and b.
|
|
Let’s
test this criterion in a simple example. The applet shows three
points (-2, -2), (0,0) and (2, 2), and a line y
= ax
+ b. Vary a and b to find the values that
give the smallest value for the sum of the errors, i.e. find
a and b that minimise d1
+ d2 + d3. What is the line of best
fit according to this criterion?
The problem
with this criterion of best fit is that positive and negative values
cancel each other out. For example, for these 3 points we have an
exact model y
= x,
where d1 = d2 = d3 = 0. But with
b = 0, all values of a also give an error sum of 0, because d1
= -d3. Check this by varying a in the
applet!
|
|
So the criterion of minimum error sum cannot distinguish between the
lines above, and all y
= ax fit equally! One way to solve
the problem is to take the sum of the absolute value of the
errors. Why will this be a better criterion – how does
this distinguish between different lines?
However, the method
used by mathematicians is to square the errors, then they are
all positive. One reason for rather choosing squares is that a quadratic
function has a minimum, so we are guaranteed that there is some line where
the sum of the squared errors is a minimum, and therefore that this line
is a better fit than all the others.
A geometric interpretation
Open the Least Square
Areas applet below, showing a geometric
interpretation of the principle of least squares. The applet shows five variable red points and a blue
line. The errors, i.e. the vertical segments
y
– y'
and the errors squared, i.e. squares with area (y
– y')2
are also shown.
Find the line of best
fit by dragging the red Y-intercept and Slope points.
Then move some of the points and find the new line of best fit.
Try to explain how this visual representation explains the least square
error criterion for the line of best fit.

|