You are here

A First Course in Statistical Programming with R

W. John Braun and Duncan J. Murdoch
Cambridge University Press
Publication Date: 
Number of Pages: 
[Reviewed by
Robert W. Hayden
, on

Older readers may remember a genre of courses and textbooks that combined instruction in the rudiments of a programming language with some basic ideas of numerical methods, often taught as a service course to STEM majors. In the beginning, the language was always Fortran, for that was all we had. Later, courses using BASIC arose to introduce a wider audience to computers, and often that course also served the STEM folks. But soon the users of computers wanted to point and click at applications written by others rather than write code of their own, and BASIC largely vanished. Around the same time computer science declared its independence, saying that computer science is not about computing, and walking out of the mathematics department. That rebellion was expressed in the language Pascal, which for a while served the introductory computer science course and was a reasonable option for number crunching. But as computer science drifted ever farther from computing, Pascal was replaced by C or Java, which have their own charms, but those do not include a focus on number crunching. Today there does not seem to be a clear successor to the old applied Fortan course. Some like computer algebra systems like Mathematica (or the free Maxima). Some like tools based on matrix calculations such as MATLAB (or the free Octave). Python is an option if speed is not a consideration. Others learn C because that is the only language in the course catalog. But few courses in C or Python emphasize number crunching.

The book at hand offers an interesting option in the statistical programming language R. As a language, R is more cryptic than Fortan, BASIC, or Python, but less so than C or Java. It has built-in functions for pretty much everything in undergraduate statistics courses, and can be used in such instead of a statistics package such as Minitab. There R has what most students will see as a disadvantage of using a command line interface much less friendly than even the old Minitab command line, though for some the fact that you can program anything that is not built in is a big advantage. However, you just need to learn the statistics commands for most undergraduate courses — no programming would be required there.

The book at hand is very like those long gone Fortran+numerical methods books of long ago. The title does not really suggest that, though the words “in statistical” in that title are printed in smaller type on the cover. Inside we find many numerical topics dear to STEM people such as summing series, the Fibonacci sequence, scientific graphics, fixed point equations, Eratosthenes sieve, the Newton and bisection algorithms for finding roots of equations, matrix calculations, simulations (including Markov chains), optimization (including linear programming), and Monte Carlo integration. Detailed treatment of program flow control suggests we are really programming, and not just using a command line interface to a statistics package. The main topic from the old Fortran books not included here is differential equations. On the computer science side, we see bubble and merge sorts, a bit on recursion, a lot on debugging, and issues in generating random numbers. Object oriented programming is mentioned only in passing, and OOP in R is usually rated somewhere between weird and useless. It is interesting to note that some high school teachers are considering using R for both Advanced Placement Statistics and AP Computer Science Principles.

Prerequisites for the text are listed as the calculus sequence plus a probability course, though not, surprisingly, a statistics course. Your reviewer thinks the latter might be better preparation than the probability course. In addition, the chapter (one of seven) on matrix calculations will not make much sense to someone who has not taken a post-Strang linear algebra course.

There are now a great many books available on R, and a few on programming with R. The book at hand is unusual in addressing beginners, and in treating R as a general number crunching tool. It assumes some familiarity with statistics, and some experience with a command line interface would be helpful, but most of the examples and tools are not inherently statistical, nor is this book an introduction to computer science nor an attempt to explain R to C programmers. Indeed, it is much closer to the old Fortan books described in the first paragraph of this review than to most other books on R.

So how does the current text stack up against those old Fortan books, assuming one wants to achieve their goals, and is happy with the choice of R as the language? On the positive side, it is one of the few current attempts to fill that void, and probably the only one using R. It is also one of very few books on R really written for non-statistician non-programmers. On the negative side, the pedagogical approach of the current book is not as successful as that of the older ones. Much of this might be explained by the opening paragraph of the preface to the first edition. There we read that the book had its roots in the authors’ frustration with their students lack of coding skills. The book itself is consistent with this, in that it seems to be driven more by what the professor wants the students to know than by what the students might want to learn. A key fact is that the book is only 215 pages long, yet covers more material than the older books. To accomplish that, the style is very terse. Explanations are often sketchy or missing. Although the text is well-written, little effort is made to convince the reader that the material is useful or interesting. “Applications” tend to consist of of applications to other coursework rather than the real world, or made-up artificial problems of the widget manufacturing sort. The overall feel is of a crash course for graduate students.

R seems a viable programming language for STEM students to learn, and learning a programming language seems a good idea for such students. This book appears to be the best option for accomplishing that, but the instructor will have to supply much in the way of details, explanations and motivation. One might hope that the authors will follow this book with one directed at a broader class of readers, and that better takes the needs of those readers into account.

After a few years in industry, Robert W. Hayden ( taught mathematics at colleges and universities for 32 years and statistics for 20 years. In 2005 he retired from full-time classroom work. He contributed the chapter on evaluating introductory statistics textbooks to the MAA’s Teaching Statistics.

1. Getting started
2. Introduction to the R language
3. Programming statistical graphics
4. Programming with R
5. Simulation
6. Computational linear algebra
7. Numerical optimization
Appendix. Review of random variables and distributions