You are here

Classic Topics on The History of Modern Mathematical Statistics: From Laplace to More Recent Times

Prakash Gorroochurn
John Wiley
Publication Date: 
Number of Pages: 
BLL Rating: 

The Basic Library List Committee suggests that undergraduate mathematics libraries consider this book for acquisition.

[Reviewed by
Joel Haack
, on

Prakash Gorroochurn’s Classic Topics on The History of Modern Mathematical Statistics: From Laplace to More Recent Times is a valuable resource for researchers and instructors with an interest in the history of statistics. It will not replace Stigler’s The History of Statistics: The Measurement of Uncertainty before 1900, nor is it meant to. Instead, Gorroochurn has focused on details of the mathematical results that are important for statistics. He has continued the story through Fisher and Neyman-Pearson to the revival and re-formation of Bayesian statistics up to 1960. This book also covers topics that are frequently discussed in core liberal arts courses to introduce statistical methods, such as confidence intervals and hypothesis testing, which were outside the purview of Stigler’s book. It is not easy reading, but having Gorroochurn at hand when reading the original articles he discusses will make the originals far more accessible.

The first part of the book is devoted to the work of Pierre-Simon de Laplace in mathematical statistics. Topics include Laplace’s definition of probability and his considerations of inverse probability, characteristic functions, the method of least squares, and his proofs of the Central Limit Theorem. The detailed calculations often include asymptotic approximations. Gorroochurn also provides quotations that exhibit Laplace’s philosophy of universal determinism underlying Laplace’s interpretations of the problems he poses and solves. Other topics here include Bayes’ Theorem, a definition of conditional probability, generating functions, the principle of indifference, the integration of \(e^{-x^2}\), Stirling’s formula, and the need for a table of normal probabilities. In line with Gorroochurn’s decision to focus on mathematical statistics, extensive discussions of Laplace’s papers on the application of statistics to geodesy and astronomy are omitted. Of course, a discussion of Laplace’s results would be incomplete without some consideration of other mathematicians of the time, so Gorroochurn also mentions the contributions of De Moivre, Gauss, Fourier, Lyapunov, Legendre, and Adrain.

The second part of the book carries the story from Galton to Fisher, tracking the beginnings of regression and correlation, Karl Pearson’s development of the chi-Squared test for goodness of fit, and Student’s t-distribution before introducing R. A. Fisher. Among Fisher’s contributions were the development of estimation theory, including such now common notions as sample statistics and population parameters, estimators, and maximum likelihood. He also introduced properties of statistics such as consistency, efficiency, and sufficiency and both the term and the notion of “variance.” In significance testing, we see the common procedure of testing a null hypothesis, finding the value of a test statistic, computing its p-value, and comparing this with a predetermined level of significance — Fisher says that “we shall not often be astray if we draw a conventional line at .05” (p. 431). We also see his development of ANOVA.

The second part continues with the contributions of Jerzy Neyman and Egon Pearson. Here, for example, we find the development of the topic of interval estimation (confidence intervals). In hypothesis testing, Neyman-Pearson also introduced the concept of an alternative hypothesis, defined Type I and Type II errors, and discussed the power of a test. The second part concludes by circling back to look at forerunners of Fisher in the topics of maximum likelihood, including the contributions of Lambert, Lagrange, Daniel Bernoulli, Adrain,and Edgeworth, and of significance testing, including Arbuthont, Nicholas Bernoulli, Daniel Bernoulli, ‘sGravesande, d’Alembert, Todhunter, Michell, Herschel, Forbes, Laplace, Edgeworth, Karl Pearson, and Student (Gosset).

The third part of Gorroochurn’s book concerns developments in mathematical statistics after Fisher and Pearson-Neyman. The table of contents provides a good guide toward the results in this part, each of which is traced through its development in the works of a number of statisticians. I will mention just two of the intriguing topics in this part. I found Wald’s development of the idea that a statistical decision can be viewed as a two-person zero-sum game between nature and the experimenter to be provocative. Within the section on the Bayesian revival, I was glad to be introduced to Ramsey’s work laying the foundations of a subjective theory of probability.

Many of the statisticians involved in the history were a bit prickly, and some far more than a bit! Gorroochurn includes discussions of the feuds between Gauss and Legendre, K. Pearson and Yule, Fisher and K. Pearson, Fisher and Neyman and E. Pearson, Fisher and Jeffreys, and Fisher and many others.

Throughout the book, Gorroochurn provides brief biographical details and portraits or photographs of the statisticians discussed; his references to more complete biographical information make it convenient to dig further into their lives if one wishes to do so.

There are a number of minor errors, but it wasn’t difficult to correct the ones I found.

This book is likely to be of most interest to those with an exposure to statistics at the level of a masters or doctoral degree. Certainly those teaching introductory statistics courses will be pleased to see how the topics they teach were developed. My having read this book will certainly enrich my own teaching.

Joel Haack is Professor of Mathematics at the University of Northern Iowa.

Preface xvi

Acknowledgments xix



1 The Laplacean Revolution 3

1.1 Pierre ]Simon de Laplace (1749–1827), 3

1.2 Laplace’s Work in Probability and Statistics, 7

1.2.1 “Mémoire sur les suites récurro ]récurrentes” (1774): Definition of Probability, 7

1.2.2 “Mémoire sur la probabilité des causes par les événements” (1774), 9 Bayes’ Theorem, 9 Rule of Succession, 13 Proof of Inverse Bernoulli Law. Method of Asymptotic Approximation. Central Limit Theorem for Posterior Distribution. Indirect Evaluation of et2 0 dt, 14 Problem of Points, 18 First Law of Error, 19 Principle of Insufficient Reason (Indifference), 24 Conclusion, 25

1.2.3 “Recherches sur l’intégration des équations différentielles aux différences finis” (1776), 25 Integration of Difference Equations. Problem of Points, 25 Moral Expectation. On d’Alembert, 26

1.2.4 “Mémoire sur l’inclinaison moyenne des orbites” (1776): Distribution of Finite Sums, Test of Significance, 28

1.2.5 “Recherches sur le milieu qu’il faut choisir entre les resultants de plusieurs observations” (1777): Derivation of Double Logarithmic Law of Error, 35

1.2.6 “Mémoire sur les probabilités” (1781), 42 Introduction, 42 Double Logarithmic Law of Error, 44 Definition of Conditional Probability. Proof of Bayes’ Theorem, 46 Proof of Inverse Bernoulli Law Refined, 50 Method of Asymptotic Approximation Refined, 53 Stirling’s Formula, 58 Direct Evaluation of e t2 0 dt, 59 Theory of Errors, 60

1.2.7 “Mémoire sur les suites” (1782), 62 De Moivre and Generating Functions, 62 Lagrange’s Calculus of Operations as an Impetus for Laplace’s Generating Functions, 65

1.2.8 “Mémoire sur les approximations des formules qui sont fonctions de très grands nombres” (1785), 70 Method of Asymptotic Approximation Revisited, 70 Stirling’s Formula Revisited, 73 Genesis of Characteristic Functions, 74

1.2.9 “Mémoire sur les approximations des formules qui sont fonctions de très grands nombres (suite)” (1786): Philosophy of Probability and Universal Determinism, Recognition of Need for Normal Probability Tables, 78

1.2.10 “Sur les naissances” (1786): Solution of the Problem of Births by Using Inverse Probability, 79

1.2.11 “Mémoire sur les approximations des formules qui sont fonctions de très grands nombres et sur leur application aux probabilités” (1810): Second Phase of Laplace’s Statistical Career, Laplace’s First Proof of the Central Limit Theorem, 83

1.2.12 “Supplément au Mémoire sur les approximations des formules qui sont fonctions de très grands nombres et sur leur application aux probabilités” (1810): Justification of Least Squares Based on Inverse Probability, The Gauss–Laplace Synthesis, 90

1.2.13 “Mémoire sur les intégrales définies et leur applications aux probabilités, et spécialement à la recherche du milieu qu’il faut choisir entre les résultats des observations” (1811): Laplace’s Justification of Least Squares Based on Direct Probability, 90

1.2.14 Théorie Analytique des Probabilités (1812): The de Moivre–Laplace Theorem, 90

1.2.15 Laplace’s Probability Books, 92 Théorie Analytique des Probabilités (1812), 92 Essai Philosophique sur les Probabilités (1814), 95

1.3 The Principle of Indifference, 98

1.3.1 Introduction, 98

1.3.2 Bayes’ Postulate, 99

1.3.3 Laplace’s Rule of Succession. Hume’s Problem of Induction, 102

1.3.4 Bertrand’s and Other Paradoxes, 106

1.3.5 Invariance, 108

1.4 Fourier Transforms, Characteristic Functions, and Central Limit Theorems, 113

1.4.1 The Fourier Transform: From Taylor to Fourier, 114

1.4.2 Laplace’s Fourier Transforms of 1809, 120

1.4.3 Laplace’s Use of the Fourier Transform to Solve a Differential Equation (1810), 122

1.4.4 Lagrange’s 1776 Paper: A Precursor to the Characteristic Function, 123

1.4.5 The Concept of Characteristic Function Introduced: Laplace in 1785, 127

1.4.6 Laplace’s Use of the Characteristic Function in his First Proof of the Central Limit Theorem (1810), 128

1.4.7 Characteristic Function of the Cauchy Distribution: Laplace in 1811, 128

1.4.8 Characteristic Function of the Cauchy Distribution: Poisson in 1811, 131

1.4.9 Poisson’s Use of the Characteristic Function in his First Proof of the Central Limit Theorem (1824), 134

1.4.10 Poisson’s Identification of the Cauchy Distribution (1824), 138

1.4.11 First Modern Rigorous Proof of the Central Limit Theorem: Lyapunov in 1901, 139

1.4.12 Further Extensions: Lindeberg (1922), Lévy (1925), and Feller (1935), 148

1.5 Least Squares and the Normal Distribution, 149

1.5.1 First Publication of the Method of Least Squares: Legendre in 1805, 149

1.5.2 Adrain’s Research Concerning the Probabilities of Errors (1808): Two Proofs of the Normal Law, 152

1.5.3 Gauss’ First Justification of the Principle of Least Squares (1809), 159 Gauss’ Life, 159 Derivation of the Normal Law. Postulate of the Arithmetic Mean, 159 Priority Dispute with Legendre, 163

1.5.4 Laplace in 1810: Justification of Least Squares Based on Inverse Probability, the Gauss–Laplace Synthesis, 166

1.5.5 Laplace’s Justification of Least Squares Based on Direct Probability (1811), 169

1.5.6 Gauss’ Second Justification of the Principle of Least Squares in 1823: The Gauss–Markov Theorem, 177

1.5.7 Hagen’s Hypothesis of Elementary Errors (1837), 182


2 Galton, Regression, and Correlation 187

2.1 Francis Galton (1822–1911), 187

2.2 Genesis of Regression and Correlation, 190

2.2.1 Galton’s 1877 Paper, “Typical Laws of Heredity”: Reversion, 190

2.2.2 Galton’s Quincunx (1873), 195

2.2.3 Galton’s 1885 Presidential Lecture and Subsequent Related Papers: Regression, Discovery of the Bivariate Normal Surface, 197

2.2.4 First Appearance of Correlation (1888), 206

*2.2.5 Some Results on Regression Based on the Bivariate Normal Distribution: Regression to the Mean Mathematically Explained, 209 Basic Results Based on the Bivariate Normal Distribution, 209 Regression to the Mean Mathematically Explained, 211

2.3 Further Developments after Galton, 211

2.3.1 Weldon (1890; 1892; 1893), 211

2.3.2 Edgeworth in 1892: First Systematic Study of the Multivariate Normal Distribution, 213

2.3.3 O rigin of Pearson’s r (Pearson et al., 1896), 220

2.3.4 Standard Error of r (Pearson et al., 1896; Pearson and Filon, 1898; Student, 1908; Soper, 1913), 224

2.3.5 Development of Multiple Regression, Galton’s Law of Ancestral Heredity, First Explicit Derivation of the Multivariate Normal Distribution (Pearson et al., 1896), 230 Development of Multiple Regression. Galton’s Law of Ancestral Heredity, 230 First Explicit Derivation of the Multivariate Normal Distribution, 233

2.3.6 Marriage of Regression with Least Squares (Yule, 1897), 237

2.3.7 Correlation Coefficient for a 2 × 2 Table (Yule, 1900). Feud Between Pearson and Yule, 244

2.3.8 Intraclass Correlation (Pearson, 1901; Harris, 1913; Fisher, 1921; 1925), 253

2.3.9 First Derivation of the Exact Distribution of r (Fisher, 1915), 258

2.3.10 Controversy between Pearson and Fisher on the Latter’s Alleged Use of Inverse Probability (Soper et al., 1917; Fisher, 1921), 264

2.3.11 The Logarithmic (or Z ]) Transformation (Fisher, 1915; 1921), 267

*2.3.12 Derivation of the Logarithmic Transformation, 270

2.4 Work on Correlation and the Bivariate (and Multivariate) Normal Distribution Before Galton, 270

2.4.1 Lagrange’s Derivation of the Multivariate Normal Distribution from the Multinomial Distribution (1776), 271

2.4.2 Adrain’s Use of the Multivariate Normal Distribution (1808), 275

2.4.3 Gauss’ Use of the Multivariate Normal Distribution in the Theoria Motus (1809), 275

2.4.4 Laplace’s Derivation of the Joint Distribution of Linear Combinations of Two Errors (1811), 276

2.4.5 Plana on the Joint Distribution of Two Linear Combinations of Random Variables (1813), 276

2.4.6 Bravais’ Determination of Errors in Coordinates (1846), 281

2.4.7 Bullet Shots on a Target: Bertrand’s Derivation of the Bivariate Normal Distribution (1888), 288

3 Karl Pearson’s Chi ]Squared Goodness ]of ]Fit Test 293

3.1 Karl Pearson (1857–1936), 293

3.2 Origin of Pearson’s Chi ]Squared, 297

3.2.1 Pearson’s Work on Goodness of Fit Before 1900, 297

3.2.2 Pearson’s 1900 Paper, 299

3.3 Pearson’s Error and Clash with Fisher, 306

3.3.1 Error by Pearson on the Chi-Squared When Parameters Are Estimated (1900), 306

3.3.2 Greenwood and Yule’s Observation (1915), 308

3.3.3 Fisher’s 1922 Proof of the Chi ]Squared Distribution: Origin of Degrees of Freedom, 311

*3.3.4 Further Details on Degrees of Freedom, 313

3.3.5 Reaction to Fisher’s 1922 Paper: Yule (1922), Bowley and Connor (1923), Brownlee (1924), and Pearson (1922), 314

3.3.6 Fisher’s 1924 Argument: “Coup de Grâce” in 1926, 315 The 1924 Argument, 315 ‘Coup de Grâce’ in 1926, 317

3.4 The Chi ]Squared Distribution Before Pearson, 318

3.4.1 Bienaymé’s Derivation of Simultaneous Confidence Regions (1852), 318

3.4.2 Abbe on the Distribution of Errors in a Series of Observations (1863), 331

3.4.3 Helmert on the Distribution of the Sum of Squares of Residuals (1876): The Helmert Transformations, 336

*3.4.4 Derivation of the Transformations Used by Helmert, 344

4 Student’s t 348

4.1 William Sealy Gosset (1876–1937), 348

4.2 O rigin of Student’s Test: The 1908 Paper, 351

4.3 Further Developments, 358

4.3.1 Fisher’s Geometrical Derivation of 1923, 358

4.3.2 From Student’s z to Student’s t, 360

4.4 Student Anticipated, 363

4.4.1 Helmert on the Independence of the Sample Mean and Sample Variance in a Normal Distribution (1876), 363

4.4.2 Lüroth and the First Derivation of the t ]Distribution (1876), 363

4.4.3 Edgeworth’s Derivation of the t ]Distribution Based on Inverse Probability (1883), 369

5 The Fisherian Legacy 371

5.1 Ronald Aylmer Fisher (1890–1962), 371

5.2 Fisher and the Foundation of Estimation Theory, 374

5.2.1 Fisher’s 1922 Paper: Consistency, Efficiency, and Sufficiency, 374 Introduction, 374 The Criterion of Consistency, 375 The Criterion of Efficiency, 377 The Criterion of Sufficiency, 377

5.2.2 Genesis of Sufficiency in 1920, 378

5.2.3 First Appearance of “Maximum Likelihood” in the 1922 Paper, 385

5.2.4 The Method of Moments and its Criticism by Fisher (Pearson, 1894; Fisher, 1912; 1922), 390

5.2.5 Further Refinement of the 1922 Paper in 1925: Efficiency and Information, 396

5.2.6 First Appearance of “Ancillary” Statistics in the 1925 Paper: Relevant Subsets, Conditional Inference, and the Likelihood Principle, 403 First Appearance of “Ancillary” Statistics, 403 Relevant Subsets. Conditional Inference, 412 Likelihood Inference, 417

5.2.7 Further Extensions: Inconsistency of MLEs (Neyman and Scott, 1948), Inadmissibility of MLEs (Stein, 1956), Nonuniqueness of MLEs (Moore, 1971), 419

5.2.8 Further Extensions: Nonuniqueness of Ancillaries and of Relevant Subsets (Basu, 1964), 421

5.3 Fisher and Significance Testing, 423

5.3.1 Significance Testing for the Correlation Coefficient (Student, 1908; Soper, 1913; Fisher, 1915; 1921), 423

5.3.2 Significance Testing for a Regression Coefficient (Fisher, 1922), 424

5.3.3 Significance Testing Using the Two ]Sample t ]test Assuming a Common Population Variance (Fisher, 1922), 427

5.3.4 Significance Testing for Two Population Variances (Fisher, 1924), 428

5.3.5 Statistical Methods for Research Workers (Fisher, 1925), 429

5.4 ANOVA and the Design of Experiments, 431

5.4.1 Birth and Development of ANOVA (Fisher and Mackenzie, 1923; Fisher, 1925), 431

5.4.2 Randomization, Replication, and Blocking (Fisher, 1925; 1926), Latin Square (Fisher, 1925), Analysis of Covariance (Fisher, 1932), 441 Randomization, 441 Replication, 442 Blocking, 442 Latin Square, 444 Analysis of Covariance, 445

5.4.3 Controversy with Student on Randomization (1936–1937), 448

5.4.4 Design of Experiments (Fisher, 1935), 456

5.5 Fisher and Probability, 458

5.5.1 Formation of Probability Ideas: Likelihood, Hypothetical Infinite Populations, Rejection of Inverse Probability, 458

5.5.2 Fiducial Probability and the Behrens-Fisher Problem, 462 The Fiducial Argument (1930), 462 Neyman’s Confidence Intervals (1934), 467 The Behrens-Fisher Problem (1935), 470 Controversy with Bartlett (1936–1939), 473 Welch’s Approximations (1938, 1947), 476 Criticism of Welch’s Solution (1956), 483

5.5.3 Clash with Jeffreys on the Nature of Probability (1932–1934), 487

5.6 Fisher Versus Neyman–Pearson: Clash of the Titans, 502

5.6.1 The Neyman-Pearson Collaboration, 502 The Creation of a New Paradigm for Hypothesis Testing in 1926, 502 The ‘Big Paper’ of 1933, 514

5.6.2 Warm Relationships in 1926–1934, 520

5.6.3 1935: The Latin Square and the Start of an Ongoing Dispute, 522

5.6.4 Fisher’s Criticisms (1955, 1956, 1960), 528 Introduction, 528 Repeated Sampling, 528 Type II Errors, 532 Inductive Behavior, 534 Conclusion, 536

5.7 Maximum Likelihood before Fisher, 536

5.7.1 Lambert and the Multinomial Distribution (1760), 536

5.7.2 Lagrange on the Average of Several Measurements (1776), 541

5.7.3 Daniel Bernoulli on the Choice of the Average Among Several Observations (1778), 544

5.7.4 Adrain’s Two Derivations of the Normal Law (1808), 550

5.7.5 Edgeworth and the Genuine Inverse Method (1908, 1909), 550

5.8 Significance Testing before Fisher, 555

5.8.1 Arbuthnot on Divine Providence: The First Published Test of a Statistical Hypothesis (1710), 555

5.8.2 ‘s Gravesande on the Arbuthnot Problem (1712), 562

5.8.3 Nicholas Bernoulli on the Arbuthnot Problem: Disagreement with ‘s Gravesande and Improvement of James Bernoulli’s Theorem (1712), 565

5.8.4 Daniel Bernoulli on the Inclination of the Planes of the Planetary Orbits (1735). Criticism by d’Alembert (1767), 571

5.8.5 Michell on the Random Distribution of Stars (1767): Clash Between Herschel and Forbes (1849), 578 Michell on the Random Distribution of Stars (1767), 578 Clash Between Herschel and Forbes (1849), 582

5.8.6 Laplace on the Mean Inclination of the Orbit of Comets (1776), 588

5.8.7 Edgeworth’s “Methods of Statistics” (1884), 588

5.8.8 Karl Pearson’s Chi ]squared Goodness ]of ]Fit Test (1900), 590

5.8.9 Student’s Small ]Sample Statistics (1908), 590


6 Beyond Fisher and Neyman–Pearson 593

6.1 Extensions to the Theory of Estimation, 593

6.1.1 Distributions Admitting a Sufficient Statistic, 594 Fisher (1934), 594 Darmois (1935), 595 Koopman (1936), 597 Pitman (1936), 599

6.1.2 The Cramér–Rao Inequality, 602 Introduction, 602 Aitken & Silverstone (1942), 603 Fréchet (1943), 607 Rao (1945), 611 Cramér (1946), 614

6.1.3 The Rao–Blackwell Theorem, 618 Rao (1945), 618 Blackwell (1947), 620

6.1.4 The Lehmann–Scheffé Theorem, 624 Introduction, 624 The Lehmann-Scheffé Theorem. Completeness (1950), 626 Minimal Sufficiency and Bounded Complete Sufficiency (1950), 629

6.1.5 The Ancillarity–Completeness–Sufficiency Connection: Basu’s Theorem (1955), 630

6.1.6 Further Extensions: Sharpening of the CR Inequality (Bhattacharyya, 1946), Variance Inequality without Regularity Assumptions (Chapman and Robbins, 1951), 632

6.2 Estimation and Hypothesis Testing Under a Single Framework: Wald’s Statistical Decision Theory (1950), 634

6.2.1 Wald’s Life, 634

6.2.2 Statistical Decision Theory: Nonrandomized and Randomized Decision Functions, Risk Functions, Admissibility, Bayes, and Minimax Decision Functions, 636

6.2.3 Hypothesis Testing as a Statistical Decision Problem, 641

6.2.4 Estimation as a Statistical Decision Problem, 642

6.2.5 Statistical Decision as a Two ]Person Zero ]Sum Game, 643

6.3 The Bayesian Revival, 645

6.3.1 Ramsey (1926): Degree of Belief, Ethically Neutral Propositions, Ramsey’s Representation Theorem, Calibrating the Utility Scale, Measuring Degree of Belief, and The Dutch Book, 646

6.3.2 De Finetti (1937): The Subjective Theory, Exchangeability, De Finetti’s Representation Theorem, Solution to the Problem of Induction, and Prevision, 656

6.3.3 Savage (1954): The Seven Postulates, Qualitative Probability, Quantitative Personal Probability, Savage’s Representation Theorem, and Expected Utility, 667

6.3.4 A Breakthrough in “Bayesian” Methods: Robbins’ Empirical Bayes (1956), 674

References 681

Index 714