You are here

Statistical Learning from a Regression Perspective

Richard A. Berk
Publication Date: 
Number of Pages: 
Springer Texts in Statistics
[Reviewed by
Peter Rabinovitch
, on

Due to the considerable hype surrounding the field, there are many, many books about statistical learning, machine learning, deep learning, etc. A large number of them are of the “how-to” variety, promising to make the untrained reader an expert in data science in a month of lunches.

This (thankfully) is not one of those books. It is much more of the read slowly and think about carefully genre.

The book focuses on supervised learning techniques that can be viewed as a form of regression, i.e those “for which the conditional distribution of a response variable is the defining interest and for which characterizing the relationships between predictors and the response is undertaken in a serious and accessible manner.” Note this includes what would normally be taught in a good course on regression (linear regression, multivariable regression, glms, etc) as only a small subset. As described in the book, many of the methods of machine learning can also be cast into this unifying framework.

The author is careful to distinguish between what he calls Level I, II and III analyses. Level I is purely descriptive, Level II is inferential, and in Level III one is concerned with causality. Blurring the lines between these levels is one cause of many unsuccessful analyses, and readers of the above mentioned “how-to” books would be well served by some time spent with the book under review.

To benefit the most, the reader should have a solid background in undergraduate statistics, including a good regression course. In order to motivate the material, familiarity with statistical learning at the level of James, Witten, Hastie, and Tibshirani’s An Introduction to Statistical Learning with Applications in R would also be advised. That being said, I think readers without this background who are using machine learning methods would also benefit by being exposed to the issues in the book.

There are instructive problems at the end of all but the last two chapters, and examples with code in R to illustrate throughout.

This is a thought provoking book worthy of serious attention by machine learning practitioners.

Peter Rabinovitch is a Senior Performance Engineer at Akamai, and has been doing data science since long before “data science” was a thing.

See the table of contents in the publisher's webpage.