You are here

Introduction to Computational Genomics: A Case Studies Approach

Nello Cristianini and Matthew W. Hahn
Cambridge University Press
Publication Date: 
Number of Pages: 
[Reviewed by
John Perry
, on

The authors of this text have conducted joint research in computational genomics, and developed a course on the same at UC Davis. One is a biologist; the other, a computer scientist. Each needed to understand the other's background, and from that interchange arose this wonderful text, which offers the reader an introduction to computational genomics. The target audience consists of students at the graduate or advanced undergraduate level with an interest in computational genomics, and a background in one of the two fields. The text also requires some familiarity with basic ideas of statistics.

Each chapter presents the reader with a scenario where computational analysis of the properties of a genome helps develop an answer to some biological question. Most scenarios immediately strike the reader as interesting and worthy of investigation; other scenarios might not appeal to some readers but turn out to be quite interesting after all. This reviewer, for example, did not expect chapter 9, The genomics of wine-making, to be nearly as engrossing as it turned out, but in addition to a historical background of wine-making, the explanation of how yeast changes its behavior during the wine-making process was fascinating. Other case scenarios that grab one's attention include studies of Chlamydia, HIV, the eyeless gene (which a "Frankenstein experiment" showed could cause a fruitfly to break out with eyes all over its body), the question of our genetic kinship with Neanderthals, and jet lag.

Each chapter/case study sports a colorful title, such as

  • "Are Neanderthals among us?"
  • "Welcome to the Hotel Chlamydia," and
  • "The boulevard of broken genes."

Through examples such as these, the reader is led through a clear exposition on the fundamental topics and skills of computational genomics, such as

  • biological tools: locating an "open reading frame" (ORF), deciding whether an ORF really is a gene, determining whether one species' gene merely resembles another species' gene or has the same ancestry, tracing the histories of genes, finding motifs that control when a gene is activated or deactivated; and
  • tools from mathematics and computer science: Markov chains, hidden Markov models, pattern matching, and more algorithms than you can shake a stick at.

Each is explained in an easy-to-understand style. The end of each chapter features a handful of problems that encourage the reader to visit a gene bank, download genetic information, and perform some of the tasks described. The reader is also encouraged to visit the book's web site,, which offers additional information and resources.

At 170 pages, one should expect accessibility, and not encyclopedic depth. As a mathematician with some knowledge of computer science and almost no background in biology, the reviewer found it easy to follow discussions of the basic biological principles, and was left thirsting for more. Someone looking for a discussion of why an open reading frame starts with a methionine codon instead of some other codon will have to look elsewhere. It is a sign of a good text that the reader finds oneself thirsting to learn more, and can determine where to look thanks to copious references. The text satisfies on both accounts.

The reviewer is somewhat less confident that a biologist with a weak background in mathematics or computer science can follow the computational aspects of the text. Some familiarity with more than an elementary programming skill appears to be assumed. The authors describe many algorithms in paragraph form, providing pseudocode for some algorithms, but not for others. A perusal of the website leaves the reviewer uncertain as to whether sufficient software is available to do all the exercises without writing some kind of computer program. For example; the first problem in the text directs the reader to

Download from GenBank the complete genomic sequence of Bacteriophage lambda, accession number NC_001416, and analyze its GC content with various choices of window size.

The text makes it quite clear what it means to analyze the GC content of a genome; what is missing is any direction on what tool the student should use. The sequence contains more than 48000 nucleotides, so perform the task by hand is possible, but highly nontrivial. For most tasks, especially later in the text, the authors direct the reader to a relevant online tool, but the reviewer could find no way to tackle this problem outside of sitting down and writing a short computer program.

An additional disappointment was the lack of clarity in too many figures: untitled axes do not shed light on the meaning of the plots.

In general, however, the reviewer enjoyed the book and would recommend it highly.

John Perry is an assistant professor of mathematics at the University of Southern Mississippi. His mathematical interests lie primarily in computational algebra. He once had a number interests outside of mathematics, but after three children and several home renovations he has forgotten what it means to have free time.

Prologue (in praise of cells); 1. The first look at a genome (sequence statistics); 2. All the sequence’s men (gene finding); 3. All in the family (sequence alignment); 4. The boulevard of broken genes (hidden Markov models); 5. Are Neanderthals among us? (variation within and between species); 6. Fighting HIV (natural selection at the molecular level); 7. SARS: a post-genomic epidemic (phylogenetic analysis); 8. Welcome to the hotel Chlamydia (whole genome comparisons); 9. The genomics of wine-making (analysis of gene expression); 10. A bed-time story (identification of regulatory sequences); Appendix.