Frames:

2. What is the best fit?

What criterion can we use to decide if one line is a better fit than another?

In approximate models, the line of best fit does not necessarily pass through any of the points in a scatterplot. Nevertheless, we would like the line to pass as close as possible to all the points! That means that we would want the sum of the errors of all the points to be as small as possible.

Error (also called the residue) is defined as the difference between the real observed value y of a point at x, and the value predicted by the model at that point, i.e. y' = ax + b. In the applet below,
d3 = yy' = 2 – (2a + b). So the size of the error depends on the values of a and b.


Let’s test this criterion in a simple example. The applet shows three points (-2, -2), (0,0) and (2, 2), and a line y = ax + b. Vary a and b to find the values that give the smallest value for the sum of the errors, i.e. find a and b that minimise d1 + d2 + d3. What is the line of best fit according to this criterion?

The problem with this criterion of best fit is that positive and negative values cancel each other out. For example, for these 3 points we have an exact model y = x, where d1 = d2 = d3 = 0. But with b = 0, all values of a also give an error sum of 0, because d1 = -d3. Check this by varying a in the applet!

So the criterion of minimum error sum cannot distinguish between the lines above, and all y = ax fit equally! One way to solve the problem is to take the sum of the absolute value of the errors. Why will this be a better criterion – how does this distinguish between different lines?

However, the method used by mathematicians is to square the errors, then they are all positive. One reason for rather choosing squares is that a quadratic function has a minimum, so we are guaranteed that there is some line where the sum of the squared errors is a minimum, and therefore that this line is a better fit than all the others.

A geometric interpretation
Open the Least Square Areas applet below, showing a geometric interpretation of the principle of least squares. The applet shows five variable red points and a blue line. The errors, i.e. the vertical segments
yy' and the errors squared, i.e. squares with area (yy')2 are also shown.

Find the line of best fit by dragging the red Y-intercept and Slope points.
Then move some of the points and find the new line of best fit.
Try to explain how this visual representation explains the least square error criterion for the line of best fit.