You are here

Beyond Basic Statistics: Tips, Tricks, and Techniques Every Data Analyst Should Know

Kristin H. Jarman
John Wiley
Publication Date: 
Number of Pages: 
BLL Rating: 

The Basic Library List Committee suggests that undergraduate mathematics libraries consider this book for acquisition.

[Reviewed by
William J. Satzer
, on

This book is a sequel of sorts to the author’s The Art of Data Analysis: How to Answer Almost Any Question Using Basic Statistics. That book aims to supplement the teaching of basic statistics by focusing on examples. In so doing, the author provides students the additional context they need to integrate and apply the basic tools and methods of statistics.

The current book explores more advanced topics in data analysis and assumes a modest background in statistics at about the level of the previous book. A fundamental principle in data analysis is that it’s very easy to get it wrong. The author approaches this artfully; she notes at least a couple of occasions when she — a veteran data analyst with several years of experience — went seriously wrong. What can happen? You can ask the wrong question. You can ask the right question, but inadvertently answer the wrong one. You can gather data that is wrong for the question you care about. You can use an inappropriate statistical technique. You can do everything else right but misinterpret the results.

After the introductory material each chapter concentrates on one data analysis question or tool by looking at just a single application. For example, one chapter uses nutrition and diet and discusses sampling strategies for gathering relevant data. In so doing it gently but quite effectively introduces ideas about the design of experiments and research methods. Other chapters consider: political polling with an emphasis on sample size calculations and statistical power; normality testing on the distribution of the lengths of Hollywood marriages; robust estimation of attendance at Sumo wrestling events in the US; chi-squared techniques for detecting cheating in a dice game; and nonparametric testing of the hypothesis that Godzilla is more popular than King Kong using fifty-six top-ten classic movie monster lists.

One of my favorites was the chapter on outlier detection that used the News of the Weird website data to identify states with reports of weirdness in the high outlier range. (Florida is prominent on the basis of both total population and per capita weird reports, but North Dakota, New Hampshire and Montana win outlier status on a per capita basis. Just for completeness it should be said that Alabama and Wyoming were lowest on the per-capita weirdness scale, but not outliers.)

The last chapter provides a very instructive warning to aspiring data analysts based on one of the author’s first experiences on the job. She was assigned to find a predictive relationship between three measured biomedical variables and the associated level of toxin in the blood. After examining the data, she found a complicated quadratic relationship that gave a very good fit. Too good, in fact: it was a classic case of overfitting. But her boss, a good mentor, was more cautious, encouraged a follow-up test, and saved her from a major embarrassment.

This is a consistently entertaining and instructive book. The stories are interspersed with serious background material, and the combination works very well. This would be entirely suitable as supplementary reading for a statistics course or for an independent reading project.

Bill Satzer ( is a senior intellectual property scientist at 3M Company, having previously been a lab manager at 3M for composites and electromagnetic materials. His training is in dynamical systems and particularly celestial mechanics; his current interests are broadly in applied mathematics and the teaching of mathematics.

Preface ix

1 Introduction: It Seemed Like the Right Thing to Do at the Time 1

2 The Type A Diet: Sampling Strategies to Eliminate Confounding and Reduce Your Waistline 9

3 Conservatives, Liberals, and Other Political Pawns: How to Gain Power and Influence with Sample Size Calculations 31

4 Bunco, Bricks, and Marked Cards: Chi ]Squared Tests and How to Beat a Cheater 47

5 Why It Pays To Be a Stable Master: Sumo Wrestlers and Other Robust Statistics 69

6 Five ]Hour Marriages: Continuous Distributions, Tests for Normality, and Juicy Hollywood Scandals 91

7 Believe It or Don’t: Using Outlier Detection to Find the Weirdest of the Weird 109

8 The Battle of the Movie Monsters, Round Two: Ramping Up Hypothesis Tests with Nonparametric Statistics 123

9 Models, Murphy’s Law, and Public Humiliation: Regression Rules to Live By 139

Appendix A Critical Values for the Standard Normal Distribution 163

Appendix B Critical Values for the T-Distribution 165

Appendix C Critical Values for the Chi-Squared Distribution 167

Appendix D Critical Values for Grubbs’ Test 169

Appendix E Critical Values for Wilcoxson signed rank test: small sample sizes 171

Glossary 173

Index 185