You are here

Introductory Statistics: A Conceptual Approach Using R

William B. Ware, John M. Ferron, and Barbara M. Miller
Publication Date: 
Number of Pages: 
BLL Rating: 

The Basic Library List Committee suggests that undergraduate mathematics libraries consider this book for acquisition.

[Reviewed by
Robert W. Hayden
, on

Reviewing this book was quite a workout! Not that it was painful, but there is much to address in three important areas: noteworthy innovations, welcome virtues, and significant faults. Before getting to those, though, let’s make it clear that this text is not aimed at the typical introductory statistics course taught in mathematics departments, with no calculus prerequisite, and typically serving a general education program and/or majors outside the department. Instead, it is written for beginning graduate students in the social sciences. However, the writing is clear and simple, and the topical coverage is close to that undergraduate course (plus a little Analysis of Variance), so it could be used there. Two things that would distinguish this text from the usual candidates are a near total lack of decoration or amusement, and the fact that it is clearly addressing future researchers (or at least readers of research). Whether these are bugs or features will depend on local circumstances.

The book claims three innovations. It uses the statistical programming language R as its main technology, it introduces modern resampling methods, and it provides what the authors call a “conceptual approach.” We will address each of these in turn.

R is fast becoming the language of choice among young statisticians. As a programming language, it offers far more power than the usual menu-driven statistics packages (usually SPSS in the social sciences), at the expense of being considerably harder to learn and use for the methods covered in this text. Outside the community of current R users, R attracts attention primarily because people hear that it is free. It is, but that may not matter much in the short term to college students, since most colleges provide statistical software at no additional cost to their students. In the long term, being free means students can take it with them when they graduate and use it forever, on any current computing platform. Being free is also of great interest to those college administrators responsible for paying for site licenses for commercial software.

For those students who will actually do research, many new methods are initially presented with R packages to carry them out. There are currently more than 4000 available. It usually takes years or decades before a select few of these make their way into commercial software. This is not the place to decide whether and where R should be used in teaching, but it is important to understand that R is very different from the software usually used in introductory courses, and adopting it has serious implications as described above, and also includes possibly significant retraining and support costs. The book at hand provides a sketchy introduction to R along with many R programs on the book’s website. An instructor could just use these to create overheads or handouts, or ask students to run the programs themselves, or perhaps to make minor changes in the programs and run them. This book probably provides enough information on R for those purposes, but probably not enough for students to become proficient at writing their own R code from scratch.

Another innovation claimed by this book is the incorporation of traditional non-parametric methods as well as newer resampling methods along with traditional methods. The non-parametric techniques require fewer assumptions but otherwise look like traditional methods in that there are formulae accompanied by tables in the back of the book. Resampling methods are quite different and look more like Monte Carlo approaches. They too are non-parametric in requiring fewer assumptions than the traditional methods. The authors do a good job of presenting these and even contribute some original research to evaluating their strengths and weaknesses.

The claim for a “conceptual approach” is a more common one, and is not clearly defined by the authors. It is a claim more often made than delivered upon. The book does have some strengths that could be given this label, though the authors do not stress that. The general presentation is verbal, with clearly written and carefully thought-out explanations that are in at least the top 10% of current textbooks. In addition to using R to crunch numbers, the authors also use it to carry out many simulations that shed light on abstract concepts. These demonstrations, and their code on the book’s website, are probably worth the cost of the book. Unfortunately, this “conceptual approach” does not extend to the exercises. Those come in two flavors. First are some multiple-choice questions at the “did you read the chapter” level. These are quite good. After those come a very small number (say 1–4) of other exercises which are pretty much at the undergraduate cookbook level stressing computation and not asking for thought, interpretation or evaluation of assumptions. As a whole, the exercises are far below the level of those in any introductory college textbook by David Moore, or any textbook commonly used in the high schools for AP Statistics.

The one thing the authors do claim as part of their “conceptual approach” is that students are asked to carry out many computations by hand using formulae in the book. While doing one computation might be enlightening for someone fluent in mathematics, for someone who is not fluent, attention will be diverted away from the concepts to the mechanics of the calculations. Unfortunately the authors carry this arithmetic approach to extremes. They repeatedly use “computational formulae” that hide the underlying concepts and are unstable with respect to round-off errors. They even provide code so students can use R as a calculator to grind out such calculations step-by-step. These formulae were useful for early desktop and pocket calculators but have long since disappeared from statistical software and better statistics textbooks. It is hard to see what they contribute to conceptual understanding.

Antiquated computations are part of a larger issue with this textbook. In reading it from cover to cover, this reviewer was constantly reminded of an old Peanuts cartoon in which one of the kids asked how she could do “new math” with an old math mind. There was a strong sense here of a really great 1970s textbook updated to include modern topics, but placing those in a 1970s conceptual framework. A minor example of this is bits of terminology that have long since disappeared from the vocabulary of most statisticians. More important is the handling of assumptions. The authors do a pretty good job of discussing these verbally, but after a few initial examples, they rarely if ever plot the data they are analyzing. They continue to preach, but students (at best) do as we do rather than as we preach. Instead of looking at the data, the authors often use a battery of summary statistics and hypothesis tests to assess things like normality. Most statisticians (and most students) would prefer intuitive graphical methods that both display the data and reinforce the ideas of what we are looking for.

The most serious consequence of an “old stats” viewpoint is that resampling methods are tacked on to the end of discussions of traditional and non-parametric methods. Most statisticians advocating the teaching of resampling methods in a first course do so in part because they find these methods much more understandable to students. Having learned the concepts in the simpler resampling context, students can see the traditional methods as alternates to (or approximations of) resampling methods, and learn them much more quickly. (See George Cobb, “The Introductory Statistics Course: A Ptolemaic Curriculum?” Technology Innovations in Statistics Education, 1(1), for details.)

The way resampling is incorporated into this text completely loses that pedagogical advantage. The result of this (and other choices already discussed) is that this book involves perhaps four times as many inference procedures as most texts covering the same basic topics. Each of the traditional methods is in danger of being accompanied by a traditional non-parametric test, one or more resampling methods, along with additional tests to check assumptions. The authors themselves admit that reviewers felt they covered too much, and it seems likely that students using this book will agree.

All in all, it is hard to recommend this book as a textbook unless you really need one that incorporates R. In that case, you would probably have to provide additional support for learning R, and essentially all the homework problems for the course. Given the current arrangement of topics, one might cover just one of the multiple methods given for each situation, perhaps rotating among traditional, non-parametric and resampling, asking students to skim the others. All formal tests concerning assumptions should be replaced with looking at the data. This would allow one to benefit from the strengths of this text while not suffering too much from its weaknesses.

Whatever its limitations as a textbook, there are two audiences for whom this book is highly recommended. Anyone teaching introductory statistics who is not already familiar with non-parametrics, resampling, or using simulations as a teaching tool will find all of those topics well presented here in the context of more familiar approaches. In addition, they may find many teaching ideas in the exposition, and they will have access to a useful library of R code. (Even if they do not want to learn R, they may get ideas here, and then find implementations in applets on the internet.) A second and similar audience would be social scientists with traditional backgrounds who want to learn more about the newer methods. Just be sure, when you pass colleagues a copy, that you also include George Cobb’s paper to read when they finish this book!

After a few years in industry, Robert W. Hayden ( taught mathematics at colleges and universities for 32 years and statistics for 20 years. In 2005 he retired from full-time classroom work. He now teaches statistics online at and does summer workshops for high school teachers of Advanced Placement Statistics. He contributed the chapter on evaluating introductory statistics textbooks to the MAA's Teaching Statistics.

1. Introduction and Background II. Descriptive Statistics 2. Describing Quantitative Data with Frequency Distributions 3. Describing Quantitative Data: Summary Statistics 4. Describing Categorical Data: Frequency Distributions, Graphics, and Summary Statistics 5. Describing the Position of a Case within a Set of Scores 6. Describing the Relationship between Two Quantitative Variables: Correlation 7. Describing the Relationship between Two Quantitative Variables: Regression III. The Fundamentals of Statistical Inference 8. The Essentials of Probability 9. Probability and Sampling Distributions 10. The Normal Distribution IV. Statistical Inference 11. The Basics of Statistical Inference: Tests of Location 12. Other One-Sample Tests for Location 13. More One-Sample Tests 14. Two-Sample Tests of Location 15. Other Two-Sample Tests: Variability and Relationships V. K-Sample Tests 16. Tests on Location: Analysis of Variance and Other Selected Procedures 17. Multiple Comparison Procedures 18. Looking Back… and Beyond Appendix A: Statistical Tables Appendix B: Getting Started with R Index