You are here

Computational Topology for Data Analysis

Tamal Krishna Dey and Yusu Wang
Cambridge University Press
Publication Date: 
Number of Pages: 
[Reviewed by
Ellen Gasparovic
, on

Computational Topology for Data Analysis is a text that computational and applied topologists have been waiting for.  Authors Tamal Dey and Yusu Wang have written a comprehensive graduate-level text on computational topology and its diverse applications for data analysis, notably emphasizing the algorithms for carrying out the wide-ranging theoretical techniques. Written in an engaging style, the book begins in Chapters 1 and 2 with the basics of topological spaces, (co)homology, and simplicial complexes (topics extensively covered elsewhere, of course), but it quickly moves into the complex mathematical concepts and structures that populate the field of topological data analysis as well as algorithms for computing them. As the authors point out, no previous books exist that accomplish this task to anywhere near the same extent and the majority of the material cannot be found in any other existing text. 

Chapter 3 is a friendly introduction to persistent homology, both in the context of sublevel sets of functions defined on topological spaces as well as persistence arising from a filtration of a simplicial complex. Naturally, Dey and Wang present algorithms for computing a persistence diagram from a filtration, starting with the combinatorial algorithm, then the standard matrix reduction algorithm, followed by a more efficient implementation. Their thorough treatment of persistence for piecewise linear (PL) functions includes the relationship between the union-find data structure and zeroth persistent homology. As the authors themselves point out, "the relations between...PL-critical points and homology groups of sublevel sets for the PL-setting have not been stated explicitly elsewhere in the literature." 

Chapter 4 looks at (standard and level set) zigzag persistence and persistence of simplicial towers, as well as algorithms for their computation. One highlight of the chapter is that the authors extend the notion of stability to the tower setting. Chapter 5 nicely presents algorithms for computing optimal generators for both persistent and ordinary homology groups, an oft-cited goal. Rips and Cech filtrations are introduced in Chapter 6, along with homological inference algorithms for point cloud data and scalar fields.

I especially enjoyed the exposition on Reeb graphs in Chapter 7. Dey and Wang looked at how to compute them (i.e., algorithms) and compare them (i.e., via distances, particularly the interleaving and functional distortion distances). They also introduced horizontal and vertical homology groups and explored the connections between Reeb graphs and other familiar structures such as contour, merge, and split trees. Chapter 8 is entirely dedicated to the use of topological tools for graphs, both undirected and directed (though the material for the latter, such as path homology, is particularly compelling).

Chapter 9 spotlights nerves of (what are typically assumed to be) path-connected covers. It contains a very nice treatment of mapper and multiscale mapper, zeroing in on significant results in the one-dimensional setting and efficient algorithms (for both mapper and the multiscale version) for piecewise linear real-valued functions. The chapter concludes with a look at a combinatorial version of mapper generalizing to maps into arbitrary compact metric spaces. Then we move into discrete Morse theory in Chapter 10, including very interesting applications to the topic of graph reconstruction. After a quick introduction to discrete Morse functions and a look at stable and unstable manifolds in the discrete setting, the authors present three algorithms for computing persistence-based discrete Morse vector fields. They then use them for graph reconstruction purposes (relevant algorithms included!) applied to road networks and neuron networks.

Chapters 11 and 12 both cover multiparameter persistence, but with different focuses. Chapter 11 begins by reframing standard persistence modules as graded modules over polynomial rings, a formulation that extends more readily to the multiparameter setting. The focus in this chapter is on presentation maps and matrices of graded modules, together with algorithms for their computation. The chapter concludes with the notions of rank invariants and graded Betti numbers, as well as their implications for defining analogues of barcodes in the multiparameter setting. On the other hand, Chapter 12 centers on definitions of and algorithms for computing distances between multiparameter persistence modules, namely, the interleaving, matching, and bottleneck distances.

The final chapter in the book, Chapter 13, contains numerous references to very recent research on the connections between topological data analysis/persistent homology and machine learning. The authors explore how researchers have begun to address fundamental questions in this area, such as how to appropriately do feature vectorization, how to assign a kernel on the space of persistence diagrams, how to optimize "topological loss functions," and how to do statistics on topological summaries such as the space of persistence diagrams. This chapter left me wanting more - in a good way!

There are many things to appreciate about this book, including the abundance of excellent and helpful figures, the extensive reference list, and the variety of instructive exercises for students to work through...not to mention that algorithms are the name of the game throughout. It is perfectly suited for a graduate-level course in a mathematics OR computer science department. Indeed, CS graduate students can learn the relevant topology (and it is presented in all of its technical glory), and math graduate students can be exposed to the nuances of algorithm design and complexity. Thanks to its inclusion of so much cutting-edge recent work and state-of-the-art algorithms, this is an ideal book for mathematicians or computer scientists looking to dive into this exciting and still very young area of research.

Ellen Gasparovic is an Associate Professor in the department of mathematics at Union College in Schenectady, NY. Her research interests are in applied and computational topology and geometry, topological data analysis, image and shape analysis, and differential topology.