You are here

Doing Bayesian Data Analysis

John K. Kruschke
Academic Press
Publication Date: 
Number of Pages: 
BLL Rating: 

The Basic Library List Committee suggests that undergraduate mathematics libraries consider this book for acquisition.

[Reviewed by
Tom Schulte
, on

Both textbook and practical guide, this work is an accessible account of Bayesian data analysis starting from the basics. Intended for first-year graduate students or advanced undergraduates, this book offers thorough training on modern Bayesian methods for data analysis. Chapter-length explorations of various implementations also make this an effective reference for non-expert practitioners who seek to apply Bayesian analysis to problems in their field.

Algebra and basic calculus, nothing really beyond simple integration, are prerequisites to maximal understanding of the theory presented in the primer, but no more mathematics than that. The book features throughout implementations in the programming language R and software packages JAGS and Stan. Comfort with basic computer programming will bring out the greatest value here. Someone looking to create a first-time R application featuring Bayesian analysis will be hard pressed to find a better text resource to contribute toward success.

As a textbook, the basics of probability and random sampling are effectively covered. Each chapter has a few, generally less than ten, exercises. Many are new or revised and they are thought-provoking, multi-step applications. They have explicit purposes (“Transformed parameters in Stan, and comparison with JAGS”) and guidelines for accomplishment. In a classroom setting, I think further contouring of exercises would be required.

Nearly universally, introduction to probability is through the flipped coin. For much of the book, as long as there is value and relevance, this book applies the basic concepts of Bayesian analysis to the simple likelihood function of the coin’s fairness, using the Bernoulli distribution. When another example is necessary it is no more complex than is needed, such as predicting weight from height for a metric predicted variable with one metric predictor. This is elaborated to predicting blood pressure from height and weight for a metric predicted variable with multiple metric predictors to bring in interaction effects.

The focus on simple and easily understood likelihood functions allows the complex concepts of Bayesian analysis, such as Markov chain Monte Carlo (MCMC) methods and hierarchical priors, to be developed before introducing additional complications of more elaborate likelihood functions with multiple parameters. This fine-tuned approach to lucid exposition is surely an outgrowth of the author’s teaching gifts and experience. It comes as no surprise that Kruschke is an eight-time winner of Teaching Excellence Recognition Awards from Indiana University, where he is Professor of Psychological and Brain Sciences and Adjunct Professor of Statistics.

This edition is truly an expanded work and includes all new programs in JAGS and Stan designed to be easier to use than the scripts of the first edition, including when running the programs on your own data sets. This new programming was a major undertaking by itself. Chapters have been rewritten and added, including new chapters on R, JAGS, and Stan. R’s lengthy new chapter features explanations of data files and syntax to the point of being a good R primer by itself.

The Analysis of Variance (ANOVA) collection of statistical models is used to analyze the differences among group means and such characteristics as variation among and between groups. Developed by statistician and evolutionary biologist Ronald Fisher, it is widely used in business and industry. I routinely come across ANOVA in manufacturing, where variance in a particular variable is partitioned into components attributable to different sources of variation such as in ANOVA gauge repeatability and reproducibility. I have long felt this traditional application of statistics is ripe for a Bayesian reboot; this text takes a detailed look at a Bayesian approach that is a hierarchical generalization of the traditional ANOVA model.

Also considered is the situation where a metric predictor, sometimes called a covariate, accompanies the primary nominal predictor as in analysis of covariance (ANCOVA). I highly recommend sections 19.1 and 19.2, along with Chapter 11 on null hypothesis significance testing and the improvement of prior distribution in Bayesian analysis, to anyone open to considering a Bayesian framework for the approach to analysis of variance that has changed little since Fisher’s work in 1925. At the heart of the matter, one must weigh the entrenched precedent of p-values computed by imaginary sampling from a null hypothesis against a modern, hierarchical Bayesian approach. The discussion of these ideas is much expanded in this edition. Chapters on one-factor and multi-factor ANOVA and ANOVA-like analysis feature all-new examples. This is all part of the comprehensive coverage and comparison of approaches on scenarios addressed by non-Bayesian textbooks, including t-tests, multiple regression, and more.

There is also a completely new chapter on multinomial logistic regression, filling in a case of the generalized linear model (namely, a nominal predicted variable) that was missing from the first edition. A chapter covering ordinal data has been greatly expanded with new examples to illustrate single- and two-group analyses, and demonstrate how interpretations differ from treating ordinal data as if they were metric.

The author’s knowledge and love for the subject comes across throughout, achieving a quirky synthesis in delightful doggerel starting off each chapter. Here is an example in introducing models analogous to traditional ANOVA:

Put umpteen people in two groups at random,
Social dynamics make changes in tandem:
Members within groups will quickly conform;
Differences between groups will soon be the norm.

Tom Schulte is a software architect at ERP provider Plex Systems and for years was a technical lead in its department providing statistical process control (SPC) solutions to manufacturers.

1.) This Book’s Organization: Read Me First!

1.1 Real People Can Read This Book

1.2 Prerequisites

1.3 The Organization of This Book

1.3.1 What Are the Essential Chapters?

1.3.2 Where’s the Equivalent of Traditional Test X in This Book

1.4 Gimme Feedback (Be Polite)

1.5 Acknowledgments

Part 1.) The Basics: Parameters, Probability, Bayes’ Rule, and R

2.) Introduction: Models We Believe In

2.1 Models of Observations and Models of Beliefs

2.1.1 Prior and Posterior Beliefs

2.2 Three Goals for Inference from Data

2.2.1 Estimation of Parameter Values

2.2.2 Prediction of Data Values

2.2.3 Model Comparison

2.3 The R Programming Language

2.3.1 Getting and Installing R

2.3.2 Invoking R and Using the Command Line

2.3.3 A Simple Example of R in Action

2.3.4 Getting Help in R

2.3.5 Programming in R

2.4 Exercises

3.) What Is This Stuff Called Probability?

3.1 The Set of All Possible Events

3.1.1 Coin Flips: Why You Should Care

3.2 Probability: Outside or Inside the Head

3.2.1 Outside the Head: Long-Run Relative Frequency

3.2.2 Inside the Head: Subjective Belief

3.2.3 Probabilities Assign Numbers to Possibilities

3.3 Probability Distributions

3.3.1 Discrete Distributions: Probability Mass

3.3.2 Continuous Distributions: Rendezvous with Density

3.3.3 Mean and Variance of a Distribution

3.3.4 Variance as Uncertainty in Beliefs

3.3.5 Highest Density Interval (HDI)

3.4 Two-Way Distributions

3.4.1 Marginal Probability

3.4.2 Conditional Probability

3.4.3 Independence of Attributes

3.5 R Code

3.5.1 R Code for Figure 3.1

3.5.2 R Code for Figure 3.3

3.6 Exercises

4.) Bayes’ Rule

4.1 Bayes’ Rule

4.1.1 Derived from Definitions of Conditional Probability

4.1.2 Intuited from a Two-Way Discrete Table

4.1.3 The Denominator as an Integral over Continuous Values

4.2 Applied to Models and Data

4.2.1 Data Order Invariance

4.2.2 An Example with Coin Flipping

4.3 The Three Goals of Inference

4.3.1 Estimation of Parameter Values

4.3.2 Prediction of Data Values

4.3.3 Model Comparison

4.3.4 Why Bayesian Inference Can Be Difficult

4.3.5 Bayesian Reasoning in Everyday Life

4.4 R Code

4.4.1 R Code for Figure 4.1

4.5 Exercises

Part 2.) All the Fundamentals Applied to Inferring a Binomial Proportion

5.) Inferring a Binomial Proportion via Exact Mathematical Analysis

5.1 The Likelihood Function: Bernoulli Distribution

5.2 A Description of Beliefs: The Beta Distribution

5.2.1 Specifying a Beta Prior

5.2.2 The Posterior Beta

5.3 Three Inferential Goals

5.3.1 Estimating the Binomial Proportion

5.3.2 Predicting Data

5.3.3 Model Comparison

5.4 Summary: How to Do Bayesian Inference

5.5 R Code

5.5.1 R Code for Figure 5.2

5.6 Exercises

6.) Inferring a Binomial Proportion via Grid Approximation

6.1 Bayes’ Rule for Discrete Values of 0

6.2 Discretizing a Continuous Prior Density

6.2.1 Examples Using Discretized Priors

6.3 Estimation

6.4 Prediction of Subsequent Data

6.5 Model Comparison

6.6 Summary

6.7 R Code

6.7.1 R Code for Figure 6.2 and the Like

6.8 Exercises

7.) Inferring a Binomial Proportion via the Metropolis Algorithm

7.1 A Simple Case of the Metropolis Algorithm

7.1.1 A Politician Stumbles on the Metropolis Algorithm

7.1.2 A Random Walk

7.1.3 General Properties of a Random Walk

7.1.4 Why We Care

7.1.5 Why It Works

7.2 The Metropolis Algorithm More Generally

7.2.1 "Burn-in," Efficiency, and Convergence

7.2.2 Terminology: Markov Chain Monte Carlo

7.3 From the Sampled Posterior to the Three Goals

7.3.1 Estimation

7.3.2 Prediction

7.3.3 Model Comparison: Estimation of p(D)

7.4 MCMC in BUGS

7.4.1 Parameter Estimation with BUGS

7.4.2 BUGS for Prediction

7.4.3 BUGS for Model Comparison

7.5 Conclusion

7.6 R Code

7.6.1 R Code for a Home-Grown Metropolis Algorithm

7.7 Exercises

8.) Inferring Two Binomial Proportions via Gibbs Sampling

8.1 Prior, Likelihood, and Posterior for Two Proportions

8.2 The Posterior via Exact Formal Analysis

8.3 The Posterior via Grid Approximation

8.4 The Posterior via Markov Chain Monte Carlo

8.4.1 Metropolis Algorithm

8.4.2 Gibbs Sampling

8.5 Doing It with BUGS

8.5.1 Sampling the Prior in BUGS

8.6 How Different Are the Underlying Biases?

8.7 Summary

8.8 R Code

8.8.1 R Code for Grid Approximation (Figures 8. and 8.2)

8.8.2 R Code for Metropolis Sampler (Figure 8.3)

8.8.3 R Code for BUGS Sampler (Figure 8.6)

8.8.4 R Code for Plotting a Posterior Histogram

8.9 Exercises

9.) Bernoulli Likelihood with Hierarchical Prior

9.1 A Single Coin from a Single Mint

9.2 Multiple Coins from a Single Mint

9.2.1 Posterior via Grid Approximation

9.2.2 Posterior via Monte Carlo Sampling

9.2.3 Outliers and Shrinkage of Individual Estimates

9.2.4 Case Study: Therapeutic Touch

9.2.5 Number of Coins and Flips per Coin

9.3 Multiple Coins from Multiple Mints

9.3.1 Independent Mints

9.3.2 Dependent Mints

9.3.3 Individual Differences and Meta-Analysis

9.4 Summary

9.5 R Code

9.5.1 Code for Analysis of Therapeutic-Touch Experiment

9.5.2 Code for Analysis of Filtration-Condensation Experiment

9.6 Exercises

10.) Hierarchical Modeling and Model Comparison

10.1 Model Comparison as Hierarchical Modeling

10.2 Model Comparison in BUGS

10.2.1 A Simple Example

10.2.2 A Realistic Example with "Pseudopriors"

10.2.3 Some Practical Advice When Using Transdimensional MCMC with Pseudopriors

10.3 Model Comparison and Nested Models

10.4 Review of Hierarchical Framework for Model Comparison

10.4.1 Comparing Methods for MCMC Model Comparison

10.4.2 Summary and Caveats

10.5 Exercises

11.) Null Hypothesis Significance Testing

11.1 NHST for the Bias of a Coin

11.1.1 When the Experimenter Intends to Fix N

11.1.2 When the Experimenter Intends to Fix z

11.1.3 Soul Searching

11.1.4 Bayesian Analysis

11.2 Prior Knowledge about the Coin

11.2.1 NHST Analysis

11.2.2 Bayesian Analysis

11.3 Confidence Interval and Highest Density Interval

11.3.1 NHST Confidence Interval

11.3.2 Bayesian HDI

11.4 Multiple Comparisons

11.4.1 NHST Correction for Experimentwise Error

11.4.2 Just One Bayesian Posterior No Matter How You Look at It

11.4.3 How Bayesian Analysis Mitigates False Alarms

11.5 What a Sampling Distribution Is Good For

11.5.1 Planning an Experiment

11.5.2 Exploring Model Predictions (Posterior Predictive Check)

11.6 Exercises

12.) Bayesian Approaches to Testing a Point ("Null") Hypothesis

12.1 The Estimation (Single Prior) Approach

12.1.1 Is a Null Value of a Parameter among the Credible Values?

12.1.2 Is a Null Value of a Difference among the Credible Values?

12.1.3 Region of Practical Equivalence (ROPE)

12.2 The Model-Comparison (Two-Prior) Approach

12.2.1 Are the Biases of Two Coins Equal?

12.2.2 Are Different Groups Equal?

12.3 Estimation or Model Comparison?

12.3.1 What Is the Probability That the Null Value Is True?

12.3.2 Recommendations

12.4 R Code

12.4.1 R Code for Figure 12.5

12.5 Exercises

13.) Goals, Power, and Sample Size

13.1 The Will to Power

13.1.1 Goals and Obstacles

13.1.2 Power

13.1.3 Sample Size

13.1.4 Other Expressions of Goals

13.2 Sample Size for a Single Coin

13.2.1 When the Goal Is to Exclude a Null Value

13.2.2 When the Goal Is Precision

13.3 Sample Size for Multiple Mints

13.4 Power: Prospective, Retrospective, and Replication

13.4.1 Power Analysis Requires Verisimilitude of Simulated Data

13.5 The Importance of Planning

13.6 R Code

13.6.1 Sample Size for a Single Coin

13.6.2 Power and Sample Size for Multiple Mints

13.7 Exercises

Part 3.) Applied to the Generalized Linear Model

14.) Overview of the Generalized Linear Model

14.1 The Generalized Linear Model (GLM)

14.1.2 Scale Types: Metric, Ordinal, Nominal

14.1.3 Linear Function of a Single Metric Predictor

14.1.4 Additive Combination of Metric Predictors

14.1.5 Nonadditive Interaction of Metric Predictors

14.1.6 Nominal Predictors

14.1.7 Linking Combined Predictors to the Predicted

14.1.8 Probabilistic Prediction

14.1.9 Formal Expression of the GLM

14.1.10 Two or More Nominal Variables Predicting Frequency

14.2 Cases of the GLM

14.3 Exercises

15.) Metric Predicted Variable on a Single Group

15.1 Estimating the Mean and Precision of a Normal Likelihood

15.1.1 Solution by Mathematical Analysis

15.1.2 Approximation by MCMC in BUGS

15.1.3 Outliers and Robust Estimation: The t Distribution

15.1.4 When the Data Are Non-normal: Transformations

15.2 Repeated Measures and Individual Differences

15.2.1 Hierarchical Model

15.2.2 Implementation in BUGS

15.3 Summary

15.4 R Code

15.4.1 Estimating the Mean and Precision of a Normal Likelihood

15.4.2 Repeated Measures: Normal Across and Normal Within

15.5 Exercises

16.) Metric Predicted Variable with One Metric Predictor

16.1 Simple Linear Regression

16.1.1 The Hierarchical Model and BUGS Code

16.1.2 The Posterior: How Big Is the Slope?

16.1.3 Posterior Prediction

16.2 Outliers and Robust Regression

16.3 Simple Linear Regression with Repeated Measures

16.4 Summary

16.5 R Code

16.5.1 Data Generator for Height and Weight

16.5.2 BRugs: Robust Linear Regression

16.5.3 BRugs: Simple Linear Regression with Repeated Measures

16.6 Exercises

17.) Metric Predicted Variable with Multiple Metric Predictors

17.1 Multiple Linear Regression

17.1.1 The Perils of Correlated Predictors

17.1.2 The Model and BUGS Program

17.1.3 The Posterior: How Big Are the Slopes?

17.1.4 Posterior Prediction

17.2 Hyperpriors and Shrinkage of Regression Coefficients

17.2.1 Informative Priors, Sparse Data, and Correlated Predictors

17.3 Multiplicative Interaction of Metric Predictors

17.3.1 The Hierarchical Model and BUGS Code

17.3.2 Interpreting the Posterior

17.4 Which Predictors Should Be Included?

17.5 R Code

17.5.1 Multiple Linear Regression

17.5.2 Multiple Linear Regression with Hyperprior on Coefficients

17.6 Exercises

18.) Metric Predicted Variable with One Nominal Predictor

18.1 Bayesian Oneway ANOVA

18.1.1 The Hierarchical Prior

18.1.2 Doing It with R and BUGS

18.1.3 A Worked Example

18.2 Multiple Comparisons

18.3 Two-Group Bayesian ANOVA and the NHST t Test

18.4 R Code

18.4.1 Bayesian Oneway ANOVA

18.5 Exercises

19.) Metric Predicted Variable with Multiple Nominal Predictors

19.1 Bayesian Multifactor ANOVA

19.1.2 The Hierarchical Prior

19.1.3 An Example in R and BUGS

19.1.4 Interpreting the Posterior

19.1.5 Noncrossover Interactions, Rescaling, and Homogeneous Variances

19.2 Repeated Measures, a.k.a. Within-Subject Designs

19.2.1 Why Use a Within-Subject Design? And Why Not?

19.3 R Code

19.3.1 Bayesian Two-Factor ANOVA

19.4 Exercises

20.) Dichotomous Predicted Variable

20.1 Logistic Regression

20.1.1 The Model

20.1.2 Doing It in R and BUGS

20.1.3 Interpreting the Posterior

20.1.4 Perils of Correlated Predictors

20.1.5 When There Are Few 1’s in the Data

20.1.6 Hyperprior Across Regression Coefficient

20.2 Interaction of Predictors in Logistic Regression

20.3 Logistic ANOVA

20.3.1 Within-Subject Designs

20.4 Summary

20.5 R Code

20.5.1 Logistic Regression Code

20.5.2 Logistic ANOVA Code

20.6 Exercises

21.) Ordinal Predicted Variable

21.1 Ordinal Probit Regression

21.1.1 What the Data Look Like

21.1.2 The Mapping from Metric x to Ordinal y

21.1.3 The Parameters and Their Priors

21.1.4 Standardizing for MCMC Efficiency

21.1.5 Posterior Prediction

21.2 Some Examples

21.2.1 Why Are Some Thresholds Outside the Data?

21.3 Interaction

21.4 Relation to Linear and Logistic Regression

21.5 R Code

21.6 Exercises

22.) Contingency Table Analysis

22.1 Poisson Exponential ANOVA

22.1.1 What the Data Look Like

22.1.2 The Exponential Link Function

22.1.3 The Poisson Likelihood

22.1.4 The Parameters and the Hierarchical Prior

22.2 Examples

22.2.1 Credible Intervals on Cell Probabilities

22.3 Log Linear Models for Contingency Tables

22.4 R Code for the Poisson Exponential Model

22.5 Exercises

23.) Tools in the Trunk

23.1 Reporting a Bayesian Analysis

23.1.1 Essential Points

23.1.2 Optional Points

23.1.3 Helpful Points

23.2 MCMC Burn-in and Thinning

23.3 Functions for Approximating Highest Density Intervals

23.3.1 R Code for Computing HDI of a Grid Approximation

23.3.2 R Code for Computing HDI of an MCMC Sample

23.3.3 R Code for Computing HDI of a Function

23.4 Reparameterization of Probability Distributions

23.4.1 Examples

23.4.2 Reparameterization of Two Parameters