You are here

Matrix Methods in Data Mining and Pattern Recognition

Lars Eldén
Publication Date: 
Number of Pages: 
Fundamentals of Algorithms
[Reviewed by
David A. Huckaby
, on

This text is aimed at upper-level undergraduates or beginning graduate students who want to see how matrix methods can be used to handle problems in data mining and pattern recognition. Students of numerical linear algebra desiring to see some applications of their subject will also find here an enjoyable read.

Readers are assumed to have completed a first course in linear algebra, although a brief catalogue of some basic facts and results is provided in chapter two. Also assumed is a first course in numerical analysis/scientific computing, meaning that the reader has some programming experience and at least a bit of exposure to numerical issues.

The text is divided into three parts. Part I, the first nine chapters, discusses the numerical linear algebra needed for the applications described in Part II, chapters 10 through 14. Part III, chapter 15, provides further implementation details of some of the algorithms described in Part I. Although the book is structured well to be read cover to cover, a reader with a little past exposure to computational linear algebra could start with one of the application chapters, referring to relevant sections of Parts I and III as needed.

The main focus of Part I is on computing matrix decompositions, usually with an eye to reducing the dimension of a problem. Chapter one previews the applications. (It can be viewed online, along with the preface, at After the brief review of linear algebra basics in chapter two, chapters three through seven discuss least squares, orthogonality, the QR decomposition, the singular value decomposition (SVD), and a Krylov subspace method. Chapter eight looks at SVD decompositions for tensors and chapter nine at clustering and nonnegative matrix factorizations. Important numerical theory and concerns such as condition, perturbation theory, error bounds, and flop counts are considered and illustrated in places, but they do not dominate the discussion. Rather, the focus is on obtaining a basic grasp of the various matrix decompositions and their uses.

The presentation of preliminary material in the first two and half chapters has (perhaps by necessity) a bit of a machine-gun feel. But the exposition settles down and becomes more user-friendly midway through chapter three with a good introduction to least squares via a Hooke’s Law problem. A couple of examples that put on display the numerical inferiority of the normal equations are well done. The reader is alerted that numerical work might require approaches other than the most obvious or intuitive.

The opening paragraph of chapter four motivates orthogonality well, characterizing a “bad” basis as being nearly linearly dependent and a “good” basis as being “very linearly independent”, i.e., orthogonal. The numerical stability that orthogonality affords is more rigorously described at the end of the chapter, after the reader meets plane rotations and Householder transformations, tools used to compute the coming matrix decompositions.

The QR decomposition is introduced in chapter five and shown to be of use in solving least squares problems. Chapter six features the SVD, introducing the idea of numerical rank and its implications for matrix approximation/dimension reduction. The SVD is applied to both least squares problems and underdetermined systems, and principal component analysis is briefly described.

Chapter 7 opens with more on the SVD and principal components. The information retrieval example previewed in chapter 1 is revisited, with two queries made on its small term-document matrix. One query matches well with the vast majority of the information in the matrix, whereas the other does not. The relative residual of both queries is plotted against the truncation index, with the first query’s residual decaying much faster than the second query’s, indicating that the first query is much better approximated by the first few left singular vectors. Each query’s components along the first five left singular vectors are examined to further illustrate the point. This is all well done. The observation that the subspace provided by the reduced-rank SVD does not take into account the query vectors motivates the rest of the chapter, a presentation of LGK bidiagonalization, a Krylov subspace method. The method is applied to a least-squares problem, once with each query vector as the right-hand side. The resulting subspaces are, as expected, closer to the queries than the subspaces of equal dimension produced by the SVD, as demonstrated by the much faster decay of the residual as basis vectors are added.

Tensors are introduced in chapter 8, along with one version of a tensor SVD called higher order SVD (HOSVD). What it means to truncate the decomposition early is illustrated with a tensor of 131 handwritten digits, each digit a 16-by-16 matrix of pixels. Chapter 9 refers once again to the information retrieval example of chapter 1. In fact, this is the first time in the book that basis vectors for a subspace approximating the (column space of the) term-document matrix are explicitly displayed. First clustering is introduced via the k-means algorithm. It is seen that (as with the SVD) some entries in the resulting basis vectors can be negative, which is difficult to interpret in the context of terms and documents. As many other data mining applications also involve nonnegative data, the second half of the chapter considers nonnegative matrix factorizations.

The applications part of the book follows next, but the exposition undergoes merely a shift in emphasis. Indeed, the last chapters of Part I involved applications as examples and illustrations, while Part II will continue to discuss and even introduce matrix concepts as needed. Chapter 10 involves the classification of handwritten digits from a U.S. Postal Service database, each digit being a 16-by-16 pixel gray scale image treated as a point/vector in 256-dimensional space. A group of training digits is split into ten sets, each containing digits of one kind, and a reduced-rank SVD is computed for each set. The least-squares residual of a test digit is then computed in each of the ten bases. The smallest residual identifies the test digit. Three examples are given and the results commented upon. The second part of the chapter introduces a distance measure in n­-space called tangent distance that is invariant under small rotations, dilations, etc. — a valuable property when trying to identify handwritten digits. The tangent distance is the residual of a least-squares problem in which the matrix contains derivatives with respect to parameters, each of which governs a different type of transformation. Building-block functions effecting several types of transformations are catalogued.

Chapter 11 on text mining extends chapter 7. Discussed are the vector space model, query matching, and the precision and recall metrics. Then a performance comparison is made on a certain query from the Medline database. The precision versus recall plot is given for the full vector space model, latent semantic indexing (i.e., reduced-rank SVD), clustering, nonnegative matrix factorization, and LGK bidiagonalization. The chapter ends by applying all five methods to 30 Medline queries, plotting average precision versus recall.

Chapter 12 describes the pagerank algorithm used by Google, complete with a discussion of Markov chains and the power method for computing eigenvalues. The topic of chapter 13 is extracting key words and key sentences from text. The aim here differs from that of the text mining done in chapter 11. There a query is issued, with the desired results being the several document vectors nearest the query vector. For key sentence extraction, however, it is usually preferable to have one representative of the most important type of sentence, one representative of the second most important type of sentence, and so forth. So once a low-rank approximation to the term-sentence matrix is calculated, rather than returning several sentence vectors nearest the dominant left singular vector (or dominant cluster, etc.), an algorithm should return the closest sentence to the most dominant direction, the closest sentence to the second most dominant direction, and so forth.

Chapter 14 rounds out the applications portion of the text by looking at face recognition. The Yale Face Database contains photographs of several people’s faces, each taken in a variety of expressions (smiling, winking, with and without glasses, etc.) Each person in the database is represented by a matrix, each column of which is the photograph of that person with a certain expression. The collection of these matrices over all the people in the database yields a three-mode tensor to which the HOSVD is applied.

Part III/Chapter 15 goes a bit deeper into some of the algorithms used in the text, with the focus on computing eigenvalues and singular values. As each algorithm is examined the Matlab function implementing it is mentioned. The chapter is constructed like most of the rest of the book: enough material and examples to get the point across, but not enough to provide complete coverage.

This text could be used as a supplement for those studying numerical linear algebra or as a starter for those interested in the applications. Throughout the book are pointers to the excellent bibliography, which contains standard numerical linear algebra texts and very recent articles on the applications and on developments in numerical linear algebra. Matlab codes and code fragments are provided throughout the book. Not optimized, they are meant to illustrate the methods and to provide the reader some test codes and a starting point for programming the algorithms in Matlab or a similar environment.

The approach in most sections of the book is user-friendly, though the treatments are succinct. Despite its brevity, the exposition is easy to follow. Examples are provided, and algorithms are well illustrated. For example, series of displayed matrices having x and 0 entries illustrate such processes as the upper-triangularizing of a matrix via Householder transformations and the bulge chasing of the symmetric QR algorithm with implicit shifts. In most chapters, just enough material is presented, and presented well, to give readers a solid understanding of the applications and the linear algebra behind them. The serious reader will want to go deeper, guided by the bibliography.

The text is supported by a webpage: Here are found copies of the preface and first chapter, a list of errata, and theory and computer exercises. Most of the theory exercises are routine, mainly checking for understanding of the material in each chapter. Most of the computer exercises parallel the applications discussed in the book, and links to databases are provided for each exercise. The user is assumed to have access to Matlab. (As of this writing, the webpage was still under construction, and not all of the exercises were up.)

David A. Huckaby is an assistant professor of mathematics at Angelo State University.