Exploring the Goodness of Fit in Linear Models - Introduction

Scott A. Sinex

 Figure 1. Columbia River velocity vs. depth
Source: QELP Data Set 011
How well does the linear mathematical model fit the data shown in Figure 1? How will a novice learner approach this question?

Modeling data is an important mathematical process for analyzing scientific information. Scientists investigate systems by collecting data and producing a mathematical model to try to understand the system.

Measurements in many natural systems can produce data with considerable noise or scatter. Getting students into this modeling process involves a number of steps that have been incorporated in many mathematics textbooks from algebra through calculus, as recommended by AMATYC (2004) and NCTM (2000). At the beginning level, mathematics textbooks have done a much better job at introducing modeling than the science textbooks.

Scott A. Sinex is Professor and Chair of Physical Sciences and Engineering at Prince George's Community College.

Assessing goodness of fit comes into the picture as students start to model data. The basic measures of the goodness of fit are the coefficient of determination and the residual. The coefficient of determination r 2 is the fraction of the y-variable that is explained by the variation of the x-variable; it ranges from 0 to 1. The residual is the difference between the actual y-datum and the y-value calculated from the regression equation.

How do we address these measures of goodness of fit using technology available to almost everyone? Graphing calculators and spreadsheet applications such as Excel calculate r 2 and easily produce residuals plots. In this article, I show how to use an interactive Excel spreadsheet to help students discover the goodness of fit concept and develop their analysis and interpretation skills. Discovery learning follows the recommendations of NSTA (2001).

For the graph in Figure 2, ask your students, "Does this show a trend of the ozone concentration over time?" Too often students do not know how to deal with scatter in the data and presume that scatter eliminates the possibility of a trend; hence, they may respond that there is no trend. Figure 3 shows the same data with the regression line and its equation.

Figure 2. Ozone at Halley Bay (1956-2000)
Source: British Antarctic Survey


Figure 3. Ozone at Halley Bay (1956-2000), with regression line
Source: British Antarctic Survey

Published March, 2005
© 2005, Scott A. Sinex