Optimal transport has been a rapidly growing research area in the past three decades due to a sequence of groundbreaking results in pure and applied mathematics. Topics on the subject are ubiquitous in mathematics and have a history dating back to Monge's problem in 1781. This book is a perfect choice for anyone interested in learning this subject. The text familiarizes the reader with the basic terminology and notation of measure theory and Riemannian geometry, and proceeds to answer the following questions:

**What is optimal transport?** Monge asked “Given two probability distributions, among all possible transport maps that rearrange one to the other, what is the map that realizes the infimum of a given transportation cost?” The problem is simple to state but far from easy to solve. Kantorivich’s relaxation in the 1940s, from a transport “map” to a transport “plan”, is the first significant breakthrough that turns optimal transport into a linear problem with an equally elegant dual formulation (see Chapter 2). Brenier’s theorem from 1987, referred to as a cornerstone by the book, formally connects optimal transport with the Monge–Ampère equation and provides another pathway to find the optimal map for the case of the quadratic cost.

**What is the Wasserstein distance?** The optimal transport cost gives rise to the so-called Wasserstein distance, which provides a new notion of distance for comparing probability distributions. By endowing the space of probability measures with the Wasserstein

distance, we obtain the Wasserstein space, which possesses rich topology to study geodesics and interpolation between distributions (see Chapter 3). It provides a natural mathematical formalism to describe datasets that are best modeled as distributions on the Euclidean space. Introduced in 2000, the remarkable Benamou–Brenier formulation (see Chapter 4.1) discloses the link between optimal transport and the continuity equation. It formulates the problem of finding the optimal transport cost, i.e., computing the Wasserstein distance, as a PDE-constrained optimization problem. The differential viewpoint offers another equivalent and powerful dynamic formulation of the optimal transport problem. The relationships between the Eulerian and Lagrangian perspectives play a central role in endowing the Wasserstein space with a differential structure, as one discovers in Chapters 3 and 4.

**What is the Wasserstein gradient flow?** The quadratic Wasserstein space induces a natural Riemannian geometry structure, as revealed by Otto’s seminal work two decades ago, which is discussed in Chapters 3 and 4. As a technical tool, it is useful for asymptotic analysis due to a special topology, while, as a methodological tool, it provides a new dissipation mechanism to model evolutionary systems, i.e., Wasserstein gradient flows. The topic of Wasserstein gradient flow has gained significant impact in numerous areas of mathematics as it provides versatile tools in PDE analysis for investigating stability and long-term behavior. It also offers a new way to solve important kinetic equations based upon their variational formulations. Recently, its mathematical richness has become impactful in data science. For example, by studying the displacement convexity first introduced by McCann in 1997 (see Chapter 4.3) of functionals in the Wasserstein space, one can understand overparameterized neural network training in machine learning and design efficient algorithms based on the Wasserstein gradient flow for statistical sampling and optimization.

**What is next?** Chapters 2, 3, and 4 are only the starting point of the beautiful theory of optimal transport. The development after these building blocks becomes incredibly diverse, and the subject branches out in multiple directions. If the readers want to deepen their

knowledge or dive into details of such research directions, Figalli and Glaudo provide a list of nice introductions in Chapter 5 to those advanced topics with relevant references for further reading, including existing monographs on optimal transport.

For example, multi-marginal optimal transport problems arise naturally in quantum chemistry applications as the semi-classical limit of the Lévy–Lieb functional, which is crucial in density functional theory for the calculation of the electronic structure of molecules (see

Section 5.3). The topic of gradient flow was briefly touched in Chapters 3 and 4, but to study it in-depth, one can refer to the “bible” in gradient flow by Ambrosio, Gigli and Savaré, where the Wasserstein space was given particular attention (see Section 5.4). As another example, optimal transport has established itself as a viable alternative to many classical methods in data science. Such applications rely on the recent advances in computational methods for optimal transport, such as the entropic regularization proposed by Cuturi in 2013 (see Section 5.6). Interestingly, research progress on the theory of entropy-regularized optimal transport has, in turn, made major contributions in theoretical probability and statistical physics, as one can find out through the further readings provided in Chapter 5.

Overall, this is a fantastic textbook for a graduate course or a reading group due to its clarity. It is also an excellent book for self-learning because of its compelling exercises with solutions. Whether you are a graduate student or a researcher in machine learning, statistics,

PDE analysis, or differential geometry, you will find that optimal transport is a valuable subject to learn, and this textbook is one of the clearest and most concise introductions from leading experts.

Dr. Yunan Yang is currently an Advance Fellow at the Institute for Theoretical Studies at ETH Zurich. She will be a Tenure-Track Assistant Professor in the Department of Mathematics at Cornell University, starting in July 2023 (yy837@cornell.edu).