You are here

Linear Algebra and Learning from Data

Gilbert Strang
Wellesley Cambridge Press
Publication Date: 
Number of Pages: 
[Reviewed by
Brian Borchers
, on

With the recent growth in undergraduate and master’s level programs in data science there is a need for textbooks and courses that cover the required mathematical background, particularly in applied linear algebra.  At MIT, Gilbert Strang has developed and taught one such course, 18.065, at the advanced undergraduate/graduate level. Strang’s course has a conventional introductory course in linear algebra as a prerequisite. The 18.065 course goes further into applied linear algebra, matrix theory, probability and statistics, randomized algorithms for linear algebra, optimization algorithms, and neural networks.  Video lectures and other course content is available through MIT’s OpenCourseWare. Linear Algebra and Learning from Data is a textbook based on that course.  Like his other textbooks, it is self-published by Strang’s Wellesley Cambridge Press.  

The topics discussed in this book could be extremely useful to any student getting started in data science and machine learning. However, it isn’t clear that the best pedagogical approach is to introduce all of this mathematical machinery in a stand-alone course rather than bring it into other data science courses as it is needed and where the applicability of the material will be more obvious.  For example, in a chapter on “Special Matrices”, the author introduces the reader to discrete Fourier transforms, Toeplitz and circulant matrices, the Kronecker product, graph Laplacians, and the orthogonal Procrustes problem. These are topics that an undergraduate student might not encounter in their course work and that are undeniably important in various areas of data science, but this book does not always connect these topics to their data science applications.  

The author has tried to cram far too much material into the 432 pages of the book so each topic gets a very quick presentation.  In a section on linear programming, game theory, and duality, we find two pages on linear programming, a page on the max-flow min-cut theorem, two pages on two-person games, and a half-page on semidefinite programming.  These are topics about which entire books have been written. Inevitably some important results are presented without sufficient explanation. For example, in a section on non-negative matrix factorization (NMF), the author gives conditions for UV to be an optimal NMF approximation to a matrix A, but these conditions are stated without proof or citing a reference where they are derived.

Furthermore, the book is very oddly laid out.  A 3-page essay on neural networks appears in the front matter before the table of contents with no explanation for this unusual location.  There is no bibliography at the end of the book. Rather, the author has inserted reference lists at various points throughout the book.  

An instructor preparing to teach a course on mathematics for data science might find this book inspirational and useful as a source of topics, examples, and exercises.  However, I expect that students would have a very difficult time with this book and I would not assign it to students as a course textbook. 


Brian Borchers is a professor of mathematics at New Mexico Tech and the editor of MAA Reviews.


See this page.