You are here

The Structure and Stability of Persistence Modules

Frédéric Chazal, Vin de Silva, Marc Glisse, and Steve Oudot
Publication Date: 
Number of Pages: 
Springer Briefs in Mathematics
[Reviewed by
Michele Intermont
, on

Given a set of data points, what shape best describes the relationship between those points? If the data set is small enough, and the data itself is only two or three dimensional, then one can hope to easily visualize the entire data set. Many data sets are quite large, however, both in number of points and in dimension. Merging ideas from computer science and statistics with mathematics, Topological Data Analysis (TDA) is a relatively new field which aims at developing techniques based on topology to address this problem.

Arguably, the main tool in TDA is that of persistence homology, and it was the publication of an algorithm for computing persistence in 2002 which sparked serious interest in the field. Persistence homology can be briefly explained as follows. Given a topological space X, the invariant known as homology associates a collection of groups to X where the n-dimensional group measures the number of n-dimensional holes in the space. Starting with a data set — which is thought of as a finite metric space — one constructs a simplicial complex using the data points as the zero-simplices and using the metric to determine which higher dimensional simplices are included. One can now compute the homology of this complex. The construction, however, involves a choice (beyond that of the metric): how close do points need to be before a simplex is added? It is this choice that persistence ameliorates. For each non-negative value, one constructs a complex using that value along with the metric to decide which higher dimensional simplices are included. For each complex, one computes the homology. Since there are only finitely many parameters where changes in the complex take place, each element of homology persists over an interval. This is known as the persistence homology of the data set. The relatively long intervals are generally regarded as important. Persistence is known to be stable – that is, small perturbations of the data lead to small perturbations of the persistence homology – and it is this quality which makes persistence particularly informative.

Since 2002, interest in TDA has sparked much investigation and many generalizations. This monograph, The Structure and Stability of Persistence Modules, is aimed at describing the situation where the parameter space is the real line, and is motivated by mathematical considerations. It does not address the algorithmic side of the subject, nor does it aim to address generalizations of persistence such as multi-dimensional persistence and zig-zag persistence.

The view given above of the persistence of a finite data set comes from a decomposable persistence module: each interval corresponds to one piece of the decomposition. In this work, the authors present persistence in great generality. Thus, even when the situation is not known to be decomposable, they construct a persistence diagram. To do so, they introduce rectangle measures. The idea is to take a persistence module, define a so-called persistence measure from it, and use rectangles in the plane to define the diagram. This generalization is the main mathematical contribution of this monograph.

Keeping their focus narrowly defined helps the authors give a clean presentation, and develop the framework clearly. Those wishing to know about persistence only to compute barcodes for data sets will find too much information here, but should be able to extract the necessary understanding without difficulty. In fact, the first chapter provides a nice overview of the subject particularly as it relates to data sets. A lengthy collection of references is also provided in the text, with pages in the first chapter devoted to culling out “recommended reading” for various subtopics, including multi-dimensional persistence.

The text is arranged in 5 main chapters. The first provides an introduction and overview, with special consideration given to the application of the subject to concrete data. In chapter two persistence modules are defined, along with the interval decomposition of modules. Chapter three introduces persistence defined via rectangle measures. In chapters four and five, the definitions of interleaving and bottleneck distances are given and used to establish stability theorems. It is the stability theorems which make persistence so useful, and the treatment here is very nice.

Overall, this book is a very nice contribution to the subject of Topological Data Analysis. In this slim volume, the novice will find a collection of main results with their proofs and many references; additionally, experts will see persistence developed more generally than usual using measure theory. There are many subsections which makes it easy to read bits and pieces in the text as well as to find specific content. There are many synthesizing comments throughout the text to help the reader put the material in context, and the writing itself is lucid.

Michele Intermont is Associate Professor of Mathematics at Kalamazoo College.

See the table of contents in the publisher's webpage.