1.
Introduction
The Revised
National Curriculum Statement calls for the application of mathematics
in real-world contexts, and the use of mathematical modelling to analyse
and describe our world. For example, the Data Handling and Probability
Learning Outcome requires learners:
- to
collect and use data to establish statistical and probability models
to solve related problems.
- to
represent bivariate
data as a scatter plot and suggest intuitively whether a linear, quadratic
or exponential function would best fit the data.
However, this is not
so simple, because when we use real-life contexts, the data often are
inaccurate, “messy” or “noisy”. This may be due
to either measurement errors in scientific experiments, or to
statistical data for which the model is not exact (e.g. the relationship
between people’s length and weight, or the relationship between
a person’s years of education and income.)
Let’s illustrate.
Here is a table and scatterplot of an abstract relationship between two variables x
and y.
|
x |
y |
0 |
5 |
3 |
6,5 |
6 |
8 |
9 |
9,5 |
12 |
11 |
15 |
12,5 |
18 |
14 |
30 |
? |
|
Can
you find the algebraic relationship between x
and y
from the table? Can you predict y(30)?
In the applet,
click the sliders, or type values, to change the parameters a and
b so that the line y
= ax
+ b
goes through all the points (to reset, click “init”).
This process of fitting a graph on given data is called curve
fitting or regression.
|
|
You should find
that the function
y = 0,5x
+ 5 exactly “fits” all the data pairs, that the line
passes exactly through all six points, and that y(30)
can be confidently predicted as 0,5x30 + 5 = 20.
|
Now look at a real-world
context: The data and scatterplot below were obtained in a science experiment
measuring the length of a spring with different masses hanging on it (Hooke’s
Law): |
|
Mass
(x)
|
Length
(y)
|
0 |
5,05 |
3 |
6,72 |
6 |
8,40 |
9 |
9,15 |
12 |
10,50 |
15 |
12,85 |
18 |
13,65 |
30 |
? |
|
Can
you find the algebraic relationship between x
and y
from the table? Can you predict y(30)?
Can you fit the line
y = ax
+ b through all six points?
Here it is impossible to fit a line exactly through all the points.
Our problem is to find the best approximate model for the
data – we call this the line of best fit or the regression line.
|
|
You would agree that
it is impossible to decide visually if one line is a better fit
than another! In this unit we investigate criteria and methods to find
the line of best fit – we will investigate it numerically, algebraically
and graphically, and use the computation power of technology tools like
Excel spreadsheets to help us. We will reflect
on the “goodness of fit” (the strength of the relationship,
or correlation), and apply our knowledge in a wide range of applications.
We will then later
return to this problem and show that the line of best fit is y
= 0,4781x
+ 5,1714.
From this model, y(30)
can be reasonably predicted as 0, 4781x30 + 5,1714 »
19,5.
Outcomes |
After
working through this unit you should be able to:
- Use
scatterplots to visualise the relationship between two variables.
- Explain
and apply the least square errors method numerically and
algebraically to find the curve of best fit.
- Use
technology tools like Excel Trendline to generate regression
models and data.
- Analyse
regression data to choose the most appropriate approximate
model for a situation.
- Interpret
the correlation coefficient for a dataset.
- Use
linearisation to model a real-life situation.
- Use
approximate models to predict unknown values (extrapolate
and interpolate).
|
|
|