| 1. 
        Introduction The Revised 
        National Curriculum Statement calls for the application of mathematics 
        in real-world contexts, and the use of mathematical modelling to analyse 
        and describe our world. For example, the Data Handling and Probability 
        Learning Outcome requires learners: 
         to 
          collect and use data to establish statistical and probability models 
          to solve related problems. to 
          represent bivariate 
            
          data as a scatter plot and suggest intuitively whether a linear, quadratic 
          or exponential function would best fit the data.  
       However, this is not 
        so simple, because when we use real-life contexts, the data often are 
        inaccurate, “messy” or “noisy”. This may be due 
        to either measurement errors in scientific experiments, or to 
        statistical data for which the model is not exact (e.g. the relationship 
        between people’s length and weight, or the relationship between 
        a person’s years of education and income.) Let’s illustrate. 
        Here is a table and scatterplot of an abstract relationship between two variables x 
        and y. 
        
|  | 
               
                |  x |  y |   
                | 0 | 5 |   
                | 3 | 6,5 |   
                | 6 | 8 |   
                | 9 | 9,5 |   
                | 12 | 11 |   
                | 15 | 12,5 |   
                | 18 | 14 |   
                | 30 | ? |  | Can 
              you find the algebraic relationship between x 
              and y 
              from the table? Can you predict y(30)? In the applet, 
              click the sliders, or type values, to change the parameters a and 
              b so that the line y 
              = ax 
              + b 
              goes through all the points (to reset, click “init”). 
              This process of fitting a graph on given data is called curve 
              fitting or regression. |  | 
         
          | You should find 
              that the function  
              y = 0,5x 
              + 5 exactly “fits” all the data pairs, that the line 
              passes exactly through all six points, and that y(30) 
              can be confidently predicted as  0,5x30 + 5 = 20.
 | 
| | Now look at a real-world 
        context: The data and scatterplot below were obtained in a science experiment 
        measuring the length of a spring with different masses hanging on it (Hooke’s 
        Law): | 
 
         
|  | 
               
                | Mass 
                    (x) 
                     | Length 
                    (y) 
                     |   
                | 0 | 5,05 |   
                | 3 | 6,72 |   
                | 6 | 8,40 |   
                | 9 | 9,15 |   
                | 12 | 10,50 |   
                | 15 | 12,85 |   
                | 18 | 13,65 |   
                | 30 | ? |  | Can 
                you find the algebraic relationship between x 
                and y 
                from the table?Here it is impossible to fit a line exactly through all the points. 
        Our problem is to find the best approximate model for the 
        data – we call this the line of best fit or the regression line.Can you predict y(30)?
 Can you fit the line  
            y = ax 
            + b through all six points?
 |  | 
| You would agree that 
        it is impossible to decide visually if one line is a better fit 
        than another! In this unit we investigate criteria and methods to find 
        the line of best fit – we will investigate it numerically, algebraically 
        and graphically, and use the computation power of technology tools like 
        Excel spreadsheets to help us. We will reflect 
        on the “goodness of fit” (the strength of the relationship, 
        or correlation), and apply our knowledge in a wide range of applications. We will then later 
        return to this problem and show that the line of best fit is y 
        = 0,4781x 
        + 5,1714.From this model, y(30) 
        can be reasonably predicted as 0, 4781x30 + 5,1714 » 
        19,5.
 
         
          | 
               
                | Outcomes |   
                | After 
                    working through this unit you should be able to:
                   
                     Use 
                      scatterplots to visualise the relationship between two variables. Explain 
                      and apply the least square errors method numerically and 
                      algebraically to find the curve of best fit. Use 
                      technology tools like Excel Trendline to generate regression 
                      models and data. Analyse 
                      regression data to choose the most appropriate approximate 
                      model for a situation. Interpret 
                      the correlation coefficient for a dataset. Use 
                      linearisation to model a real-life situation. Use 
                      approximate models to predict unknown values (extrapolate 
                      and interpolate). |  |  |