You are here

Combinatorics of Genome Rearrangements

G. Fertin, A. Labarre, I. Rusu, É. Tannier, and S. Vialette
MIT Press
Publication Date: 
Number of Pages: 
[Reviewed by
John D. Cook
, on

One of the first questions one may have about Combinatorics of Genome Rearrangements by Guillaume Fertin et al is how to classify the book. To what extent is it a mathematics book or a biology book? What background is necessary to read it? Combinatorics is concerned with problems motivated by biology and uses vocabulary derived from biology, but otherwise is not about biology per se. The content of Combinatorics is primarily mathematics and computer science — combinatorial analysis and the efficiency of algorithms. The first chapter contains a quick introduction to the biology necessary to read the book and the mathematical prerequisites are minimal.

Although the book is largely self-contained, it can be difficult to follow. The presentation is condensed; the book reads more like an outline or a reference than a textbook. Combinatorics is long on definitions but short on examples.

One of the primary goals of applying combinatorial reasoning to genetics is to quantify the degree of similarity between two genetic sequences in a biologically meaningful way. One way to compare two sequences is to place the sequences side by side and simply count the number of locations in which the sequences differ. For example, consider the following two sequences.


The sequences differ in 15 positions, as indicated by underlined characters. Said another way, fifteen changes would be necessary to transform one sequence into the other. But when viewed from a different perspective, the sequences only differ in one way: the middle segments is reversed.


Because genetic sequences are sometimes reversed during reproduction, the difference between the two sequences could plausibly be explained by a single reversal. We'd like to define a distance between sequences that measures the number of biologically plausible "moves" necessary to transform one into another. Basing distances on a set of biologically plausible moves gives genetic combinatorics a character distinct from combinatorics in general.

Biologically-inspired metrics may be difficult to compute; depending on the set of "moves" being modeled, it may be difficult to determine how many moves are required. This is why the book is concerned with the efficiency of algorithms. The book discusses numerous algorithms whose analysis is a matter of current research. Other algorithms in the book have been studied thoroughly but are known to be "hard" in a well-defined sense (i.e. NP-complete).

Combinatorics is inspired by applications to genetics but is not limited to such applications. For example, strings are not limited to the four characters representing the four possible base pairs of DNA. Theorems are stated and proved more generally. One could imagine results from combinatorial genetics being applicable to analogous situations outside of biology. Mathematical genetics has drawn extensively from mathematics and computer science; it is interesting to see the genetics returning the favor by contributing back to these other disciplines.

John D. Cook is a research statistician at M. D. Anderson Cancer Center and blogs at The Endeavour.



1 Introduction    1.1 A Minimalist Introduction to Molecular Evolution 1
     1.2 Birth of the Combinatorics of Genome Rearrangements 4
     1.3 Statement of the Problem 6
     1.4 Scope of This Survey 7
     1.5 Overview of the Models 7
     1.6 Organization of the Book 8
2 Genomes as Permutations 13
     2.1 The Symmetric Group 13
     2.2 The Cycles of a Permutation 14
     2.3 Signed Permutations 15
     2.4 Distances on Permutation Groups 15
     2.5 Circular Permutations 18
     2.6 First Measures of Similarity between Permutations 20
3 Distances between Unsigned Permutations 25
     3.1 Transposition Distance 25
     3.2 Prefix Transposition Distance 36
     3.3 Reversal Distance 40
     3.4 Prefix Reversal Distance (Pancake-Flipping) 47
     3.5 Variants 49
     3.6 Relations between Distances on Unsigned Permutations 61
4 Distances between Signed Permutations 63
     4.1 Conserved Interval Distance 63
     4.2 Signed Reversal Distance 64
     4.3 Variants of Sorting by Reversals 69
     4.4 Combined Operations 72
     4.5 Double Cut-and-Joins 73
5 Rearrangements of Partial Orders 75
     5.1 Genomes as Partially Ordered Sets 75
     5.2 Partially Ordered Sets 75
     5.3 Constructing a Poset 78
     5.4 Reversal Distance 79
     5.5 Breakpoint Distance 80
6 Graph-Theoretic and Linear Algebra Formulations 83
     6.1 Simple Permutations and the Interleaving Graph 83
     6.2 The Overlap Graph 84
     6.3 The Local Complementation of a Graph 85
     6.4 The Matrix Tightness Problem 85
     6.5 Extension to Sorting by Transpositions 86
     6.6 The Intermediate Case of Directed Local Complementation 87
7 Generalities 91
     7.1 Biological Motivations 91
     7.2 Strings and Rearrangements on Strings 92
     7.3 Balanced Strings 94
     7.4 How to Deal with Multiple Copies? 95
8 Distances between Arbitrary Strings 97
     8.1 The Match-and-Prune Model 98
     8.2 The Block Edit Model 123
9 Distances between Balanced Strings 133
     9.1 Minimum Common String Partition Problems 133
     9.2 Reversal Distance 138
     9.3 Unsigned Transpositions 147
     9.4 Unsigned Block Interchanges 153
     9.5 Relations between Distances 157
10 Paths and Cycles 161
     10.1 Genomes 161
     10.2 Breakpoints 162
     10.3 Intervals 163
     10.4 Translocation Distance 164
     10.5 Double Cut-and-Joins (2-Break Rearrangement) 170
     10.6 k-Break Rearrangement 171
     10.7 Fusions, Fissions, Translocations, and Reversals 172
     10.8 Rearrangements with Partially Ordered Chromosomes 174
11 Cycles of a Permutation 175
     11.1 A Model for Multichromosomal Circular Genomes 175
     11.2 A Generalization to Signed Genomes 178
12 Set Systems and the Syntenic Distance 181
     12.1 Introduction 181
     12.2 Structural Properties 182
     12.3 Lower Bounds 184
     12.4 Diameter 185
     12.5 Algorithmic Results 185
     12.6 Conjectures and Open Problems 189
13 Median and Halving Problems 193
     13.1 Breakpoint Median 194
     13.2 Reversal and DCJ Median 197
     13.3 Duplicated Genomes 199
     13.4 Other Variants, Generalizations, and Discussion 205
14 Rearrangement Phylogenies 207
     14.1 The Large Parsimony Problem 207
     14.2 The Large Parsimony Problem with Gene Orders 209
     14.3 Heuristics for the Breakpoint/Reversal Phylogeny Problem 211
     14.4 Variants 220
15 Software 223
     15.1 Pairwise Rearrangements 223
     15.2 Phylogeny Reconstruction and Medians 226
16 Open Problems 229
     16.1 Complexity Issues 229
     16.2 Diameter 231
     16.3 Tightness of Bounds 232
A Graph Theory 235
     A.1 Undirected Graphs 235
     A.2 Directed Graphs 240
B Complexity Theory 243
     B.1 The Class NP 243
     B.2 Some NP-Complete Problems 252
Glossary 257
Bibliography 263