| 2. 
        What is the best fit? What criterion 
        can we use to decide if one line is a better fit than another? In approximate models, 
        the line of best fit does not necessarily pass through any of the 
        points in a scatterplot. Nevertheless, we would like the line to 
        pass as close as possible to all the points! That means that we 
        would want the sum of the errors of all the points to be as small as possible. 
         
          | 
               
                | Error 
                    (also called the residue) is defined 
                    as the difference between the real observed value 
                    y 
                    of a point at x, 
                    and the value predicted by the model at that point, 
                    i.e. y' 
                    = ax 
                    + b. In the applet below,d3 = y 
                    – y' 
                    = 2 – (2a + b). So the size of the error depends on 
                    the values of a and b.
 |  |  
 
        
          | Let’s 
              test this criterion in a simple example. The applet shows three 
              points (-2, -2), (0,0) and (2, 2), and a line y 
              = ax 
              + b. Vary a and b to find the values that 
              give the smallest value for the sum of the errors, i.e. find 
              a and b that minimise d1 
              + d2 + d3. What is the line of best 
              fit according to this criterion? The problem 
              with this criterion of best fit is that positive and negative values 
              cancel each other out. For example, for these 3 points we have an 
              exact model y 
              = x, 
              where d1 = d2 = d3 = 0. But with 
              b = 0, all values of a also give an error sum of 0, because d1 
              = -d3. Check this by varying a in the 
              applet!  |  |   So the criterion of minimum error sum cannot distinguish between the 
              lines above, and all y 
              = ax fit equally! One way to solve 
              the problem is to take the sum of the absolute value of the 
              errors. Why will this be a better criterion – how does 
              this distinguish between different lines? However, the method 
        used by mathematicians is to square the errors, then they are 
        all positive. One reason for rather choosing squares is that a quadratic 
        function has a minimum, so we are guaranteed that there is some line where 
        the sum of the squared errors is a minimum, and therefore that this line 
        is a better fit than all the others. A geometric interpretationOpen the Least Square 
        Areas applet below, showing a geometric 
        interpretation of the principle of least squares. The applet shows five variable red points and a blue 
        line. The errors, i.e. the vertical segments
 y 
        – y' 
        and the errors squared, i.e. squares with area (y 
        – y')2 
        are also shown.
 Find the line of best 
        fit by dragging the red Y-intercept and Slope points.Then move some of the points and find the new line of best fit.
 Try to explain how this visual representation explains the least square 
        error criterion for the line of best fit.
  
 |