You are here

Understanding Search Engines: Mathematical Modeling and Text Retrieval

Michael W. Berry and Murray Browne
Publication Date: 
Number of Pages: 
Software, Environments, and Tools
[Reviewed by
Toby Donaldson
, on

Understanding Search Engines is an excellent, crisp introduction to some of the essential mathematics driving information retrieval (IR). I'd recommend this book to anyone who wants a clear, concise introduction to the vector model of IR, which is probably the best place for anyone new to the field to begin their study. The book is should be easily accessible to non-math majors who have already taken a linear algebra course. Indeed, some of the material in this book could serve as the basis (no pun intended!) for interesting applications in a traditional introductory linear algebra course.

The book introduces the classic vector model of IR, plus the more recent idea of latent semantic indexing. Both fit naturally into a linear algebra framework. The essential idea is to treat the documents being searched as a huge term-document matrix, where the columns are documents and the rows are of terms that appear in them. User queries are converted into vectors, and then compared to each document vector using measures such as cosine similarity. Latent semantic indexing is treated within the same mathematical framework, and is essentially an application of singular value decomposition to the term/document matrix.

The book also briefly covers a number of other IR topics, such as web searching (including Google's PageRank algorithm), and relevance feedback. It does not discuss probabilistic or boolean models of IR, and it says little about the (immense) engineering challenges of getting an IR system up and running from scratch. But for what the book does cover, it covers it well, and it is fine introduction to this fascinating field.

Dr. Toby Donaldson is an instructor in Simon Fraser University's School of Computing Science.

Preface to the Second Edition; Preface to the First Edition; Chapter 1: Introduction; Chapter 2: Document File Preparation; Chapter 3: Vector Space Models; Chapter 4: Matrix Decompositions; Chapter 5: Query Management; Chapter 6: Ranking and Relevance Feedback; Chapter 7: Searching by Link Structure; Chapter 8: User Interface Considerations; Chapter 9: Further Reading; Bibliography; Index.