You are here

High-Dimensional Statistics

Martin J. Wainright
Cambridge University Press
Publication Date: 
Number of Pages: 
Cambridge Series in Statistical and Probabilistic Mathematics
[Reviewed by
Fabio Mainardi
, on
High-dimensional statistics deals with the analysis of data where the number of variables is comparable, or even much bigger, than the number of observations. Examples of such datasets occur frequently in genomics, where high throughput measurements can generate thousands of variables.
The author starts with an introductory chapter describing what can go wrong in a high dimensional setting. For example, in covariance estimation, typical error bounds depend on the ratio d/n (dimension/sample size), so if this ratio is big, the bounds will be far from optimal (or just useless). What can help in this cases is knowing that the data is endowed with some form of low-dimensional structure: for example, sparsity (eg in compressed sensing, Lasso, basis pursuit) or rapid decay in eigenspectra (for instance, the eigengap, the gap between the first and second eigenvalues of a covariance matrix, is shown to be a key parameter in high-dimensional PCA).  Also, the book emphasizes the non-asymptotic results (hence, the importance of concentration inequalities): error bounds for statistical models are expressed as functions of both n (sample size) and d (number of variables).
The author made a substantial effort to simplify difficult proofs and split them in more digestible pieces. Overall, the exposition is very clear, and there are plenty of examples to illustrate and motivate definitions and theorems.
Chapters are nicely separated into two categories: a) tools and techniques; b) models and estimators.  Examples of content covered in the tools and techniques sections include: concentration inequalities, bounds on empirical processes, reproducing kernel Hilbert spaces, minimax lower bounds. Models and estimators cover mainly the following: covariance estimation, sparse linear models, pca, estimation of low-rank matrices in high dimensions, graphical models, non-parametric least squares estimators.
Content in the tools & techniques chapters is of general interest and applicability, not limited to high-dimensional statistical models (for example: VC-dimension, Rademacher complexity, uniform laws of large numbers).
The main focus of the book is not on practical implementation, so there are no code samples, no datasets, and very few applications to ‘real-life’ problems are mentioned.  One topic that is missing is a discussion of multiple testing. But given the size of the book, I think this omission is quite understandable and probably necessary due to space constraints.
This book is ideal as a reference or textbook, for self-study or teaching. Prerequisite knowledge is a good mathematics training at undergraduate level, especially linear algebra, probability theory and mathematical statistics.  There are around 15-20 exercises per chapter, ranging from routine to very challenging. Unfortunately, no solutions are provided, it would be useful to have guidance for the solutions to the most challenging problems.
A book with similar contents was published in the same Cambridge series: High-dimensional Probability, by Roman Vershynin. While there is a lot of overlap between these two textbooks, Vershynin’s focus is not primarily on statistical models, so topics like graphical models are not included in his presentation; on the other hand, he did include a very interesting discussion of the Grothendieck inequality in the context of kernel maps.
In conclusion, this is a very valuable book, covering a variety of important topics, self-contained and nicely written.


Fabio Mainardi ( is a senior data scientist at Nestlé Research. After a PhD in number theory, he has been working as mathematician in R&D divisions of different companies. His mathematical interests include statistical models, probability, discrete mathematics and optimization theory.