You are here

The Beauty of Mathematics in Computer Science

Jun Wu
Chapman & Hall/CRC
Publication Date: 
Number of Pages: 
[Reviewed by
John D. Cook
, on

Jun Wu’s book The Beauty of Mathematics in Computer Science is not as broad as the title suggests. A more descriptive title would be Applications of Probability to Natural Language Processing. Not everything in the book is related to natural language processing (NLP) — there is a brief section on cryptography, for example — but the large majority of the book is devoted to NLP and related topics: word segmentation, search, text classification, etc. And the mathematics used in the book is primarily probability: Bayes’ theorem, (hidden) Markov models, etc.

Wu is eminently qualified to write a book on NLP. He is a native speaker of Chinese fluent in English and so brings the experience of having worked with NLP in two very different languages. His examples from Chinese NLP will be particularly interesting to readers only familiar with Western languages. For example, the word segmentation problem is quite different in Chinese compared to English, as is the problem of typing prose into a computer. Wu is a research scientist at Google and developed the company’s search algorithms for Chinese, Japanese, and Korean text. He writes with authority when he describes how NLP is carried out in practice and at scale.

The content of the book began as a series of blog articles and retains the feel of blog articles, each chapter being between eight and nine pages long on average. The book does not go into any topic in great depth, and deliberately so. At one point Wu mentions that readers of his blog articles have asked for more technical details, but he intends to leave that to others and take a more expository approach.

From its title I expected a book that gave examples of how applications of various branches of mathematics are scattered throughout computer science: Euler’s theorem in cryptography, monads in functional programming, Fibonacci numbers in sorting, etc. The book is much more specialized than that, but it is a valuable source for its specialization. When the author speaks of how search engines work, for example, it's reassuring to know that he knows whereof he speaks.

John D. Cook is an independent consultant working in data privacy.

1. Words and languages, numbers and information
   Words and numbers
   The mathematics behind language

2. Natural language processing|From rules to statistics
   Machine intelligence
   From rules to statistics

3. Statistical language model
   Describing language through mathematics
   Extended reading: Implementation caveats
   Higher order language models
   Training methods, zero-probability problems, and smoothing
   Corpus selection

4. Word segmentation
   Evolution of Chinese word segmentation
   Extended reading: evaluating results

5. Hidden Markov model
   Communication models
   Hidden Markov model
   Extended reading: HMM training

6. Quantifying information
   Information entropy
   Role of information
   Mutual information
   Extended reading: Relative entropy

7. Jelinek and modern language processing
   Early life
   From Watergate to Monica Lewinsky
   An old man's miracle

8. Boolean algebra and search engines
   Boolean algebra

9. Graph theory and web crawlers
   Graph theory
   Web crawlers
   Extended reading: two topics in graph theory
   Euler's proof of the Königsberg bridges
   The engineering of a web crawler

10.PageRank: Google's democratic ranking technology
   The PageRank algorithm
   Extended reading: PageRank calculations

11.Relevance in web search
   Extended reading: TF-IDF and information theory

12.Finite state machines and dynamic programming: Navigation in Google Maps
   Address analysis and Finite state machines
   Global navigation and dynamic programming
   Finite state transducer

13.Google's AK- designer, Dr Amit Singhal

14.Cosines and news classification
   Feature vectors for news
   Vector distance
   Extended reading: The art of computing cosines
   Cosines in big data
   Positional weighting

15.Solving classification problems in text processing with matrices
   Matrices of words and texts
   Extended reading: Singular value decomposition method and applications

16.Information Fingerprinting and its application
   Information Fingerprint
   Applications of information Fingerprint
   Determining identical sets
   Detecting similar sets
   YouTube's anti-piracy
   Extended reading: Information Fingerprint's repeatability and SimHash
   Probability of repeated information Fingerprint

17.Thoughts inspired by the Chinese TV series Plot: The mathematical principles of cryptography
   The spontaneous era of cryptography
   Cryptography in the information age

18.Not all that glitters is gold: Search engine's anti-SPAM problem and search result authoritativeness question
   Search engine anti-SPAM
   Authoritativeness of search results

19.Discussion on the importance of mathematical models

20.Don't put all your eggs in one basket: The principle of maximum entropy
   Principle of maximum entropy and maximum entropy model
   Extended reading: Maximum entropy model training

21.Mathematical principles of pinyin input method
   Input method and coding
   How many keystrokes to type a Chinese character?
   Discussion on Shannon's First Theorem
   The algorithm of phonetic transcription
   Extended reading: Personalized language models

22.Bloom Filters
   The principle of Bloom Filters
   Extended reading: The false alarm problem of Bloom Filters

23.Bayesian network: Extension of Markov Chain
   Bayesian network
   Bayesian network's application in word classification
   Extended reading: Training a Bayesian network

24.Conditional random Fields, syntactic parsing, and more
   Syntactic parsing|the evolution of computer algorithms
   Conditional random fields
   Conditional random fields' applications in other fields

25.Andrew Viterbi and the Viterbi Algorithm
   The Viterbi algorithm
   CDMA technology: The foundation of G mobile communication

26.God's algorithm: The expectation maximization algorithm
   Self-converged document classification
   Extended reading: Convergence of expectation-maximization algorithms

27.Logistic regression and web search advertisement
   The evaluation of web search advertisement
   The logistic model

28.Google Brain and artificial neural networks
   Artificial neural network
   Training an artificial neural network
   The relationship between artificial neural networks and
   Bayesian networks
   Extended reading: \Google Brain"

29.The power of big data
   The importance of data
   Statistics and information technology
   Why we need big data