You are here

Foundations of Linear and Generalized Linear Models

Alan Agresti
Publisher: 
John Wiley
Publication Date: 
2015
Number of Pages: 
472
Format: 
Hardcover
Series: 
Wiley Series in Probability and Statistics
Price: 
125.00
ISBN: 
9781118730034
Category: 
Textbook
BLL Rating: 

The Basic Library List Committee suggests that undergraduate mathematics libraries consider this book for acquisition.

[Reviewed by
Robert W. Hayden
, on
10/26/2016
]

Alan Agresti is a prominent American statisticians who is widely regarded to have “written the book” on categorical data analysis. Here we have a book that is mainly about Generalized Linear Models (GLM). We will get to why that is not surprising after discussing what GLMs actually are.

Most readers will be familiar with simple linear regression as taught in an introductory statistics class. That simple model is ambiguously named. The word “linear” is often taken to refer to the fitting of a linear equation, but in mathematical statistics it refers to the fact that the model is linear in the parameters to be estimated. That distinction is probably easiest to see in the common application in high school algebra of simultaneous equation solving to the problem of finding a parabola through three given points. Coordinates \(x\) and \(y\) are given numerically for the points, and the object is to find the coefficients (often named \(a\), \(b\), and \(c\)) of the quadratic. The quadratic is not linear, but the equations for finding its coefficients are. Similarly, we can use regression to fit a quadratic to many points. The quadratic is not linear, but a statistician would consider this linear regression because the coefficients are found by solving linear equations.

Fitting quadratics is one generalization of simple linear regression, as is multiple regression, where we have more than one independent variable. GLMs include such possibilities but are mainly distinguished by two factors. First, inference may be based on some distribution other than the normal. Second, the dependent variable is usually transformed in some way, typically in accord with the assumed error distribution. That characterization is too abstract, though, as statistics is an applied field, and it is generally not all models satisfying those conditions that interest us, but only a few that have applications in the real world.

Perhaps the simplest example of a GLM is logistic regression that predicts a binary (Success/Failure) variable based on one or more other variables. For a set of fixed values for those variables, the binary response is clearly not normally distributed — it only has two values. We can further generalize this to a categorical response variable with more than two possible outcomes, and this is indeed an important application, as well as the reason it is not surprising to see a book on GLMs from Agresti.

The book itself seems most similar to the many books published over the last thirty years or so that try to include applications in a mathematical statistics course. We receive mostly theory with a couple of applications thrown in at the ends of chapters. The intent is not to give broad coverage but instead to cover just the core topics every statistician should know about GLMs. To accomplish that task, we have to require some strong prerequisites. These include the usual mathematical statistics course(s), a course or two in multiple regression and ANOVA, and a course in linear algebra — preferably one that includes a geometric approach as geometric language is commonly used here.

Agresti writes very clearly and simply for a book with such strong prerequisites, but the reader should expect a vigorous workout as soon as English is replaced by equations. Mathematicians may find the book lacks flow, as topics are often chosen not because they are the obvious next step, but because they are very useful. At times, the writing adds to this lack of continuity. For your reviewer it conjured of an image of a large stack of 3X5 cards possibly accumulated by the author over the years, each containing an important point he wanted to make. These points are not always fit together seamlessly in the book. On the other hand, many of those points are pearls of wisdom you won’t find elsewhere. So while this may or may not be the best way to first learn about GLMs, anyone who does learn about them needs this reference on their bookshelf.

Speaking of references, this book contains 17 pages of references to other works, which will be useful to anyone working in this area. (The disappearance of such references from undergraduate textbooks seems a sad admission that students will never use the material they learn.) We also get a large number of interesting and thought provoking exercises. There are 14 pages of hints or skeletal solutions to an unsystematic sample of these exercises.

This book is an essential reference for anyone working with or teaching GLMs. It is not as outstanding as a teaching aid. Whether to chose it as a text may depend on issues such as how well it matches the prospective students’ needs and background. It would certainly be preferred to a book that primarily tells students how to get the computer to produce output for GLMs — an important topic, and one barely touched on here, but it has only a supporting role.


After a few years in industry, Robert W. Hayden (bob@statland.org) taught mathematics at colleges and universities for 32 years and statistics for 20 years. In 2005 he retired from full-time classroom work. He now teaches statistics online at statistics.com and does summer workshops for high school teachers of Advanced Placement Statistics. He contributed the chapter on evaluating introductory statistics textbooks to the MAA's Teaching Statistics.

Preface xi

1 Introduction to Linear and Generalized Linear Models 1

1.1 Components of a Generalized Linear Model 2

1.2 Quantitative/Qualitative Explanatory Variables and Interpreting Effects 6

1.3 Model Matrices and Model Vector Spaces 10

1.4 Identifiability and Estimability 13

1.5 Example: Using Software to Fit a GLM 15

Chapter Notes 20

Exercises 21

2 Linear Models: Least Squares Theory 26

2.1 Least Squares Model Fitting 27

2.2 Projections of Data Onto Model Spaces 33

2.3 Linear Model Examples: Projections and SS Decompositions 41

2.4 Summarizing Variability in a Linear Model 49

2.5 Residuals Leverage and Influence 56

2.6 Example: Summarizing the Fit of a Linear Model 62

2.7 Optimality of Least Squares and Generalized Least Squares 67

Chapter Notes 71

Exercises 71

3 Normal Linear Models: Statistical Inference 80

3.1 Distribution Theory for Normal Variates 81

3.2 Significance Tests for Normal Linear Models 86

3.3 Confidence Intervals and Prediction Intervals for Normal Linear Models 95

3.4 Example: Normal Linear Model Inference 99

3.5 Multiple Comparisons: Bonferroni Tukey and FDR Methods 107

Chapter Notes 111

Exercises 112

4 Generalized Linear Models: Model Fitting and Inference 120

4.1 Exponential Dispersion Family Distributions for a GLM 120

4.2 Likelihood and Asymptotic Distributions for GLMs 123

4.3 Likelihood-Ratio/Wald/Score Methods of Inference for GLM Parameters 128

4.4 Deviance of a GLM Model Comparison and Model Checking 132

4.5 Fitting Generalized Linear Models 138

4.6 Selecting Explanatory Variables for a GLM 143

4.7 Example: Building a GLM 149

Appendix: GLM Analogs of Orthogonality Results for Linear Models 156

Chapter Notes 158

Exercises 159

5 Models for Binary Data 165

5.1 Link Functions for Binary Data 165

5.2 Logistic Regression: Properties and Interpretations 168

5.3 Inference About Parameters of Logistic Regression Models 172

5.4 Logistic Regression Model Fitting 176

5.5 Deviance and Goodness of Fit for Binary GLMs 179

5.6 Probit and Complementary Log–Log Models 183

5.7 Examples: Binary Data Modeling 186

Chapter Notes 193

Exercises 194

6 Multinomial Response Models 202

6.1 Nominal Responses: Baseline-Category Logit Models 203

6.2 Ordinal Responses: Cumulative Logit and Probit Models 209

6.3 Examples: Nominal and Ordinal Responses 216

Chapter Notes 223

Exercises 223

7 Models for Count Data 228

7.1 Poisson GLMs for Counts and Rates 229

7.2 Poisson/Multinomial Models for Contingency Tables 235

7.3 Negative Binomial GLMS 247

7.4 Models for Zero-Inflated Data 250

7.5 Example: Modeling Count Data 254

Chapter Notes 259

Exercises 260

8 Quasi-Likelihood Methods 268

8.1 Variance Inflation for Overdispersed Poisson and Binomial GLMs 269

8.2 Beta-Binomial Models and Quasi-Likelihood Alternatives 272

8.3 Quasi-Likelihood and Model Misspecification 278

Chapter Notes 282

Exercises 282

9 Modeling Correlated Responses 286

9.1 Marginal Models and Models with Random Effects 287

9.2 Normal Linear Mixed Models 294

9.3 Fitting and Prediction for Normal Linear Mixed Models 302

9.4 Binomial and Poisson GLMMs 307

9.5 GLMM Fitting Inference and Prediction 311

9.6 Marginal Modeling and Generalized Estimating Equations 314

9.7 Example: Modeling Correlated Survey Responses 319

Chapter Notes 322

Exercises 324

10 Bayesian Linear and Generalized Linear Modeling 333

10.1 The Bayesian Approach to Statistical Inference 333

10.2 Bayesian Linear Models 340

10.3 Bayesian Generalized Linear Models 347

10.4 Empirical Bayes and Hierarchical Bayes Modeling 351

Chapter Notes 357

Exercises 359

11 Extensions of Generalized Linear Models 364

11.1 Robust Regression and Regularization Methods for Fitting Models 365

11.2 Modeling With Large p 375

11.3 Smoothing Generalized Additive Models and Other GLM Extensions 378

Chapter Notes 386

Exercises 388

Appendix A Supplemental Data Analysis Exercises 391

Appendix B Solution Outlines for Selected Exercises 396

References 410

Author Index 427

Example Index 433

Subject Index 435