You are here

Geometric Structure of High-Dimensional Data and Dimensionality Reduction

Jianzhong Wang
Publication Date: 
Number of Pages: 
[Reviewed by
John D. Cook
, on

High dimensional space is a strange place to be. Jianzhong Wang opens his book Geometric Structure of High-Dimensional Data and Dimensionality Reduction with several examples that establish this point.

  1. A sphere in a cube takes up a vanishing proportion of the volume.
  2. Nearly all the volume contained in a sphere is located in a thin shell near its surface.
  3. Nearly all the probability mass of a multivariate normal is in the tails.
  4. The diagonals of a cube are nearly orthogonal to each of the edges.
  5. Random vectors are nearly normalized.

There is no room here to make these statements precise, but Wang does explain what each of these informal statement mean more formally. This sets the tone for the book: intuitive explanations followed by formal derivations.

The prerequisites for the book, if you want to understand the content in detail, are fairly high. For example, one should know measure theory and functional analysis up through Sobolev spaces. The book also relies heavily on differential geometry, because high-dimensional data often concentrate on lower-dimensional manifolds. It gives a brisk introduction to differential geometry, which may be an adequate review for people who have seen the theory before. It would be hard for someone unfamiliar with differential geometry to learn the topic from this book.

In addition to a solid theoretical presentation, the book includes numerous examples and applications, including Matlab implementations of algorithms. As stated in the preface, “the book is primarily intended for computer scientists, applied mathematicians, statisticians, and data analysts.”

About one third of the book is devoted to data geometry and the rest is devoted to data reduction. Of the portion devoted to data reduction, about one fourth covers linear methods such as PCA (principal component analysis). The remaining three fourths covers a variety of nonlinear methods: isomaps, MVU (maximum variance unfolding), LLE (locally linear embedding), diffusion maps, etc.

The book could have used better copy editing. Occasionally the flow of the text is interrupted by an awkward sentence, though the content is strong and otherwise easy to read.

John D. Cook is an independent consultant and blogs at The Endeavour.

The table of contents is not available.