Prof. Rodney Ehrlich
School of Public Health
UCT Faculty of Health


By the end of the session and reading you should have an understanding of:
  1. the classification of variables according to their scale of measurement (revision from a previous session);
  2. the task of converting conceptual variables into operational variables;
  3. the concepts of precision (reliability) and accuracy (and its variant validity) of a research measurement (not of the study finding) as applied to a questionnaire or clinical or laboratory instrument;
  4. how to maximise the precision and validity of your instrument from the beginning of your study
  5. Ways to test the precision and/or accuracy/validity of your test instrument in a pilot or pre-study and to adapt your procedures or protocol on the basis of this;
  6. how a lack of precision or accuracy in your measurement is likely to affect your study results.

Note: You may see the Power Point presentation used at the lecture.

Measure scales:

This is nicely set out in Hulley pp. 37-39, and is categorised as:


Measurement scales are relevant because the type of scale determines:

  1. How the data capture file needs to be set up.
  2. The method of statistical summary and analysis.
  3. The statistical power of the study, i.e. the ability of a study to rule out or find a difference between two groups or an association between two variables.

Note that continuous variables can be converted into categorical variables. E.g. anaemia present/absent. This categorisation, although frequently done for convenience in research, involves loss of information (and statistical power).

Conceptual and operational variables:

Conceptual variables are expressed in theoretical, general, qualitative, or subjective terms. Our research hypotheses usually start of at this level, for example, “compliance with medication is poorer among patients who lack family support”.

To measure variables, an objective definition is required – this may be a matter of having a readily available validated instrument, establishing consensus or inferring an operational variable from theory (or all three). In this example you would need to have a definition of “compliance with medication” and “family support”.

As part of this process, you will decide on the measurement scale. You may decide to make compliance: “yes/no” (nominal), or “none/ low/moderate/high (ordinal) based on definitions of number of doses taken. For family support, you may do the same: present/absent, or, more likely, use some ordinal scale based on a questionnaire or outsider evaluation.

Another example: “Recovery was faster among those with less pulmonary inflammation at baseline”. Recovery has to be converted into some measurable variable, e.g. “increase in lung function over one year” (continuous scale), as does inflammation, e.g. “neutrophil concentration in broncho-alveolar lavage fluid: (continuous scale).

Reliability and accuracy/validity of measurements:

Precision of a measurement (as per Hulley) is often called reliability, and has the sense of reproducibility or repeatability of a measurement. (Statistically, a measurement which lacks precision or reliability is subject to a lot of random error.)

Hulley distinguishes between accuracy and validity. Accuracy in general refers to approximation to the “truth” as determined by an instrument or test accepted as the “gold standard”.

For a physical or physiological measure, accuracy has the intuitive sense of the instrument being able to get to the real value as it exists in nature.

Hulley reserves the term validity for measures which are based on memory, self-report, subjective, or complex or abstract; for which there is no easily obtainable gold standard even if an objective “truth” exists. Information on medical history and questionnaire responses fall into this category. (“Have you ever had surgery?”, “How many sexual partners have you had”, “Score your pain on this scale”, etc.) . In this sense, validity is a subset of accuracy.

(Note that there is some parallel with the above discussion between a conceptual and operational variable, although the emphasis there was reducing a more abstract concept to a measurable one, rather than comparing one measurement against another).

Both accuracy and validity have the sense of unbiased. (Statistically speaking, if a measurement lacks accuracy or validity, it suffers from systematic error.)

Maximising your reliability and accuracy before you start:

Hulley gives very useful tables setting this out.

For reliability:
  1. choose a standardised instrument if one is available;
  2. refine the instrument (e.g. adapt to local conditions);
  3. train your observers;
  4. automate measurement;
  5. for continuous variables, repeat the measurement during the study, and take the mean (for example, blood pressure).
For validity:
  1. item 4 above, plus,
  2. make measurements not dependent on self-report;
  3. blinding;
  4. calibration.

Testing your reliability or validity:

This can be set up in a formal pre-study (which you might want to report as a study) or a pilot study (where you generally don’t use the data).

The tests you use depend on the scale of measurement. The tests of precision and accuracy/validity are a little more complex for continuous than for categorical variables.

For simplicity, in this session we will stick to binary (yes/no) categorical variables.

Measures of reliability: percentage concordance, kappa statistic.

Measures of validity:

Implications for study as whole of lack of reliability or validity of the measurement instrument:

Descriptive studies:
  1. Lack of precision of a measurement introduces random error. This leads to a fall in statistical power, and the confidence interval around the estimate widens. This is equivalent to greater uncertainty about the true estimate.

  2. Lack of accuracy or validity introduces systematic error. This leads to a biased estimate which cannot be influenced by repeated measurements or increasing the sample size.

Comparative studies (for example, cohort or experiments):

  1. Lack of precision introduces random error, and reduces the likelihood of finding a true difference or association.

  2. 2.1 If lack of accuracy or validity affects both groups equally, then in general this reduces the likelihood of finding a true difference.

    2.2 If lack of accuracy affects the two groups differently, it may mask or exaggerate any true difference.