You are here

Data Analysis and Graphics Using R: An Example-Based Approach

John Maindonald and W. John Braun
Publisher: 
Cambridge University Press
Publication Date: 
2010
Number of Pages: 
525
Format: 
Hardcover
Edition: 
3
Series: 
Cambridge Series in Statistical and Probabilistic Mathematics 10
Price: 
80.00
ISBN: 
9780521762939
Category: 
Textbook
[Reviewed by
Robert W. Hayden
, on
09/14/2010
]

R is a cost-free programming language for statistics. Many people hear the word “free” and hope that R might be an inexpensive alternative to some commercial package like Minitab for use in a first course in statistics. It would probably be a better choice than other programing languages such as Fortran, but it would be a rare introductory course in which any programming language would be the most appropriate software. So, having lost a few hopeful but now disappointed readers, let us move on to some uses that an MAA member might make of R.

If you yourself need to do sophisticated statistical analyses — especially anything that requires custom programming and a tight budget — then R can be a good choice. It would be an obvious choice for a statistical computing course. One might also learn R if it is the default choice of departments or programs your department services, say a mathematical statistics course for statistics majors, or a methods course for graduate students in an applied area that has embraced R. It is also a natural for projects and independent studies that need sophisticated statistical analyses. A student seeking employment in an area where R is used might simply wish to learn the language. If you are wondering if anything free can be any good, R is developed and maintained by statisticians and researchers for their own use, and the quality of the software is generally at least on a par with software developed for profit.

The same cannot always be said for the documentatin, at least for beginners. Given its background and intended audience, R documentation tends to start at a pretty high level, and this volume is no exception. A realistic minimum might be to have (or be in the process of getting) a B.S. in statistics, with at least some prior programming experience. For the latter, all that is needed is familiarity with basic ideas like loops, arrays, and variables, so even experience programming a calculator might suffice. The applications cover most of the topics in a statistics major, so much of the book will be wasted for those with narrower backgrounds.

Undergraduate statistics majors might be asked to buy this book for use as a reference in all their major courses, much as students at MIT were once asked to buy the CRC mathematical tables. All users will probably find the book tough and uneven going. You will need to try the examples on a computer yourself, experiment with options, and ponder the results. The authors’ views about prerequisites are optimistically minimal, but in practice they do not seem to have sorted out who their audience might be. They give a formula for a simple standard deviation, accompanied by a verbal explanation about subtracting the mean from each observation, squaring the results, et cetera. They explain what a stem and leaf and box and whisker plot are — all three topics readers would have seen in an introductory course, or even in high school. At the same time, the authors provide code and output for graphics far more sophisticated than anything in a first course, without any explanation of how to read the output or what it means. It’s a pity, because the statistical advice is generally excellent and state of the art, and the analyses perceptive.

Things are more even on the coding side. The norm is to briefly describe some (real) data, then present the code for analysis. There is no teaching of syntax or programming apart from the actual applications. Most of the code is explained, though often tersely, or by references to later chapters. Although the examples involve real data and real research questions, there is often no conclusion drawn. Too often the example ends with the chart or graph the code produced, but the authors seem to feel any conclusions will be too obvious to need stating. Similarly, the many exercises provided tend to emphasize getting the desired output rather than interpreting it. No solutions are provided in the text, but these and many other resources can be found on its website. The solutions are almost wholly confined to annotated code with little statistical interpretation.

Is this the R book for you? Most general books on R are at this level and have similar strengths and weaknesses. A choice might be made on topical coverage, though that is fairly standard, or personal, subjective criteria. With a lot of work you can certainly learn a lot from this text. One alternative unique enough to be worth mentioning that might be a good fit for some is Verzani (2005). That covers only about a year-long introductory statistics course but explains things in more detail.


References:

John Verzani. Using R for Introdctory Statistics, Chapman & Hall/CRC, 2005.

See also our review of the second edition.

 


 

After a few years in industry, Robert W. Hayden (bob@statland.org) taught mathematics at colleges and universities for 32 years and statistics for 20 years. In 2005 he retired from full-time classroom work. He now teaches statistics online at statistics.com and does summer workshops for high school teachers of Advanced Placement Statistics. He contributed the chapter on evaluating introductory statistics textbooks to the MAA's Teaching Statistics.

Preface; Content - how the chapters fit together; 1. A brief introduction to R; 2. Styles of data analysis; 3. Statistical models; 4. A review of inference concepts; 5. Regression with a single predictor; 6. Multiple linear regression; 7. Exploiting the linear model framework; 8. Generalized linear models and survival analysis; 9. Time series models; 10. Multi-level models, and repeated measures; 11. Tree-based classification and regression; 12. Multivariate data exploration and discrimination; 13. Regression on principal component or discriminant scores; 14. The R system - additional topics; 15. Graphs in R; Epilogue; Index of R symbols and functions; Index of authors.