You are here

A Primer of Permutation Statistical Methods

Kenneth J. Berry, Janis E. Johnston, and Paul W. Mielke Jr.
Publication Date: 
Number of Pages: 
[Reviewed by
Robert Hayden
, on
The book at hand is unusually difficult to review.  To deal with the difficulty, the review will have three parts.  First, I will explain what permutation tests are. Then I will try to give an objective account of the contents of the work.  Finally, I will express some misgivings about the book. This may be regarded as controversial and the reader may wish to seek other opinions.
Historically, permutation tests arose in the context of scientific experiments.  There we ideally wish to eliminate all variables other than those we wish to study.  Then we divide the experimental units into two or more groups and apply one or more treatments.  We wish to know whether the treatments had any effect.  The reason that may not be obvious is that the groups could differ only because the experimental units differed.  The usual way to deal with that is to randomly assign the treatments to the units.  One advantage is that this tends to make the units in each group about the same on average.  The second advantage is that random assignment provides a probability model for dealing with any differences that exist in the case at hand.
The classic example is comparing the means of two groups.  The null hypothesis is that the observed difference is just due to which units were assigned to which group.  Assuming this null, we can measure how much variability assignment creates by reassigning the values to the groups over and over.  We then compute the difference in group means for each.  The distribution of these differences is called the "permutation distribution" and it is used as a reference distribution for assessing the null.  If the actual observed difference is in the tails of that distribution, we conclude that the actually observed difference is not likely to have arisen by chance.  The permutation distribution plays the same role for data from an experiment that the sampling distribution does when we take samples.  However, the permutation distribution can be computed exactly from the experimental data, whereas the sampling distribution is not accessible and must be approximated, usually with a table found at the back of a textbook.  For this reason, the p-values computed from a permutation test are called "exact."  (Tests based on sampling may also be called exact, but are so in a different sense. There the p-value is a random variable and the definition must be in terms of that variable's distribution.)  It may be important to mention here that statisticians going back at least to Fisher generally consider permutation tests to be the gold standard for data from experiments.  They are not widely known today because we need powerful computers to perform them.
This book covers the inference topics in a very full introductory course, but uses permutation tests rather than the more familiar tests based on sampling theory. Thus it is "advanced" in the sense that the methods are unfamiliar, but the applications are not.  The authors typically compare the results of a permutation test to a traditional sampling-based test on the same data.  There are two innovations in comparison with classical treatments of permutation tests.  Those would either do a test on the same statistic as the sampling tests (say the difference in the group means) or take advantage of the permutation tests' ability to work with a wide range of possible statistics, including, in particular (but not limited to), a difference in medians. The authors propose a rather complex test statistic that has the advantage of fitting a wide range of different situations. It includes a parameter that lets the user chose between the L1 (median) and L2 (mean) norm for measuring effects.
In recent times there has been an increased emphasis on practical as well as statistical significance, and researchers have been encouraged to report some measure of the size of the effect they observed.  When available, it is considered good practice to offer a direct measure.  For our example of a two-group comparison, we could simply state what the difference was, or provide a confidence interval for that difference.  These have the advantage of being easily understood and readily compared to a researcher's idea of how large an effect would be practically significant.  However, many tests, such as chi-squared for contingency tables, do not provide such a measure, or a relevant confidence interval.  For those, we have indirect measures of effect size, which often compare the effect to the random variation present rather than to what is practically significant. There are a great many different relative measures of effect size in use, often a different one for every application.  The authors of this book offer a one size fits all measure based on permutation tests and applicable to a wide range of situations.  
The chapters of this book each cover some traditional test.  For each sampling-based test, the authors give both the traditional approach and a permutation approach.  They also compute their measure of effect size.  In addition, they compare a permutation test based on medians with one based on means.  They also look at an example where the permutation distribution is approximated with a simulation. Lastly, they compare the permutation approach with some classical nonparametric approaches based on ranks.  While only the content of a first course is assumed as far as statistics is concerned, the mathematical level is comparable to an upper-division course for mathematics majors.
The reader of this book will get thorough coverage of the authors' approach to exact permutation tests and how such tests compare to other alternatives.  Unfortunately, I have multiple misgivings about that approach.  While the one size fits all test statistic and measure of effect size may be useful for proving theorems about a wide range of situations, these measures are highly unintuitive and computationally complex. I do not think their method could reasonably be taught to beginners, and wonder about the computer time required in real applications.  And on the subject of computers, while nearly all permutation tests are computationally intensive, the authors have very little to say about computers.  These days such a book would commonly be accompanied by an R package that implements the methods and includes the data and code for all the examples.
More serious is the fact that the book reads like advertising copy for permutation tests.  The authors use boilerplate text comparing those over and over to sampling-based methods.  The text seems to contrast the advantages of permutation tests to the disadvantages of sampling-based approaches.  The advantages of the latter and the disadvantages of the former get little attention.  More basic than that is the assumption that both are candidates for any given application.  My take is that a permutation test models variability associated with random assignment and should be preferred whenever random assignment is used. The methods in introductory statistics model random sampling and should be used when we take random samples.  If we did not use one of these random processes, special justification would be needed to apply either approach. The authors do describe the permutation tests as "exact" when they are, but they are exact answers to a question about random assignment, not random sampling. The authors state repeatedly that permutation tests make no assumptions about a population or its parameters.  That is true, but they do not mention the price we pay for that: permutation tests tell us nothing about whether our results generalize to some wider population.  
Despite its flaws, this book contains a lot of useful information.  The difficulty lies in identifying its ideal audience.  The mathematical level and lack of software support make it unsuitable to beginners or people who use rather than develop statistics.  (For those audiences the classic Permutation Tests by Edgington and Onghena is recommended.)  The work of the authors of the book under review on providing a general framework involving single measures of effects, and emphasizing choice of the norm, may be useful or appealing to those in mathematical statistics.


After a few years in industry, Robert W. Hayden ( taught mathematics at colleges and universities for 32 years and statistics for 20 years. In 2005 he retired from full-time classroom work.