Frames:

3. Least square error regression

The table shows known values relating variables x and y.
What will be the best prediction of y when x = 10?

We will find the line of best fit by finding the least square errors in three ways: numerically, algebraically and using technology software.

 x  y
0
2
2
8
4
14
6
16

So y' = 2x + 3 better fits the data than y' = 3x + 2. But we have to check if another line does not give an even better fit!
Check which of these lines gives a smaller square error and is therefore an even better fit, and therefore gives a better estimate of y'(10):
y' = 2x + 3; y' = 2x + 4; y' = 2x + 5;
y' = 3x; y' = 3x + 1; y' = 3x + 2;
y' = 2,5x + 1; y' = 2,5x + 2; y' = 2,5x + 3

To find the least square error numerically is very computation-intensive! So let’s use technology tools to do the calculations for us! Open and use the Best line Excel worksheet:



Is there always a minimum square error? Why? This is an important theoretical and practical question – you do not want to search for a minimum if a unique minimum does not exist!

Let’s investigate a special case to begin with. Let’s fix b at 3, and ask: How does S = å(y y')2 change as a changes? For which a is y' = ax + 3 the best fit for the above data? Investigate the problem numerically, by using this Excel worksheet:

Now investigate the problem algebraically: Show that S = å(y y')2 can be written as a quadratic function of a, and deduce the minimum value of S and the corresponding value of a.

Check your answer here:  

When both a and b can vary, we will find that the sum of square errors is a quadratic function in a and b, and therefore S always has a minimum value. We can find the minimum value of S and the corresponding values of a and b, and therefore the line of best fit in several ways.

Let’s find the line of best fit through numerical trial-and improvement. The computational power of Excel can help us to easily find the line of best fit:

Now investigate the problem algebraically. Show that S = å(y' y)2 can be written as a quadratic function in a and b and deduce the minimum value of S and the corresponding values of a and b.

Check your answer here:  



In the introduction we asked the question about the algebraic relationship between x and y in the table (Hooke’s Law) and the value of y(30).

Now use the computational power of Excel to find the least square error and hence the line of best fit:

Mass (x)
Length (y)
0
5,05
3
6,72
6
8,40
9
9,15
12
10,50
15
12,85
18
13,65
30
?