You are here

Computer Age Statistical Inference

Bradley Efron and Trevor Hastie
Publisher: 
Cambridge University Press
Publication Date: 
2016
Number of Pages: 
475
Format: 
Hardcover
Price: 
74.99
ISBN: 
9781107149892
Category: 
Monograph
BLL Rating: 

The Basic Library List Committee recommends this book for acquisition by undergraduate mathematics libraries.

[Reviewed by
William J. Satzer
, on
10/2/2016
]

The science of statistics has changed a great deal over the last sixty years. New algorithms, the availability of much greater computational power, and exciting new applications have driven a huge amount of new work. The current book examines how and why those changes have occurred. The authors are well known statisticians who have been right in the middle of things: Efron may be best known for his work on the bootstrap technique, and Hastie’s name is perhaps most frequently associated with machine learning. But they are both statisticians of broad scope.

The book has three parts that are organized historically. The first part reviews the themes of classical inference (frequency-based, Bayesian, and Fisherian). All these were in place before electronic computation was available, but they now use it to considerable advantage. The second part considers the period of early computer use, from the 1950s through the 1990s. Then the third, which moves into the twenty-first century, explodes with an amazing variety of extremely ambitious algorithms that once again focus on statistical inference.

This is not a textbook. Among other things, it is an attempt to characterize the current state of statistics by identifying important tools in the context of their historical development. It also offers an enlightening series of illustrations of the interplay between computation and inference. The authors describe the tools, techniques and algorithms in sufficient detail for a mathematically trained reader with some statistics background to develop a general understanding of their purpose and value. However, the level of detail is often insufficient to give the reader more than a glimpse of how and where they’re used, and why they work.

There are a lot of tools, techniques and algorithms! These are usually described individually or in related groups in a single chapter, and most of these chapters are less than thirty pages long. So perhaps the book is best viewed as a sampler. Rather than attempting to discuss each chapter here, it makes more sense to point to a few interesting ideas and themes to give a sense of what the book is like. Here are three examples:

Efron’s bootstrap is a powerful tool that the authors first illustrate in the first chapter when they consider various ways of ascribing standard errors to a set of kidney function data. Bootstrap methods are then taken up in greater detail in two later chapters. Bootstrap techniques are particularly relevant in the context of this book because they magnify computational requirements a hundred- or a thousand-fold over conventional methods, and hence were unthinkable before the late twentieth century.

The study of gene function using microarrays generates enormous amounts of data. Large-scale hypothesis testing and the associated question of false-discovery rates have become very important here. Thousands of hypotheses are tested simultaneously, and the expectation is generally that only a handful of genes will be of interest. The false-discovery rate and the breakthrough in statistical inference that resulted are described in the first chapter of the third part of the book. The key here is to avoid even one false rejection among a very large number of hypotheses.

Neural networks as a tool for predictive modeling were introduced in the mid 1980s and they caused quite a stir. They faded as other techniques were introduced, and then, with larger and faster computing systems, were reincarnated around 2012 as “deep learning”. They have proven to be particularly successful at the difficult task of classifying natural images.

This is an attractive book that invites browsing by anyone interested in statistics and its future directions. The book’s web site provides data files for all the examples. Some code in R for bootstrap functions is also provided, and it looks like more things like this may be added later.


Bill Satzer (bsatzer@gmail.com) was a senior intellectual property scientist at 3M Company. His training is in dynamical systems and particularly celestial mechanics; his current interests are broadly in applied mathematics and the teaching of mathematics.

Part I. Classic Statistical Inference:
1. Algorithms and inference
2. Frequentist inference
3. Bayesian inference
4. Fisherian inference and maximum likelihood estimation
5. Parametric models and exponential families
Part II. Early Computer-Age Methods:
6. Empirical Bayes
7. James–Stein estimation and ridge regression
8. Generalized linear models and regression trees
9. Survival analysis and the EM algorithm
10. The jackknife and the bootstrap
11. Bootstrap confidence intervals
12. Cross-validation and Cp estimates of prediction error
13. Objective Bayes inference and Markov chain Monte Carlo
14. Statistical inference and methodology in the postwar era
Part III. Twenty-First Century Topics:
15. Large-scale hypothesis testing and false discovery rates
16. Sparse modeling and the lasso
17. Random forests and boosting
18. Neural networks and deep learning
19. Support-vector machines and kernel methods
20. Inference after model selection
21. Empirical Bayes estimation strategies
Epilogue
References
Index.