You are here

Computational Epidemiology: Data-Driven Modeling of COVID-19

Ellen Kuhl
Publication Date: 
Number of Pages: 
[Reviewed by
Bill Satzer
, on
Mathematical epidemiological modeling should have provided insight and guidance into the dynamics, prediction and control of the global pandemic we have been experiencing. Yet it did not. Despite success in predicting the spread of diseases like measles, mumps, and smallpox, the failure of modeling for the COVID-19 pandemic was clear, with predictions wrong often by orders of magnitude. What happened?
This book tries to explain what went wrong and to offer an approach that can address the shortcomings of previous models. This book was written in the first half of 2021 and was motivated by a class that the author had taught at Stanford. The general plan of the book is to introduce the reader to mathematical epidemiology and basic models of the spread of epidemics, and then to go on to define and develop new models with assumptions that better reflect the spread of COVID-19.
The author’s treatment has four parts. The first two parts introduce epidemiology and its mathematical treatment using systems of ordinary differential equations, and then computational epidemiology that addresses solution methods for these systems. The third and fourth parts describe the author’s own approach that proposes a network view of epidemiology and then a data-driven epidemiology that acknowledges the effect of randomness, noise and uncertainty in the analysis of disease data.
The discussion begins with an introduction to mathematical epidemiology. This includes a background in general epidemiology as well as descriptions of the classical mathematical models. These models have three basic assumptions: the population under consideration is isolated from the rest of the world, contact between individuals is homogeneous, and the model is fully deterministic. The population is typically divided into compartments.  These include the susceptible (S), exposed (E) infectious (I), and recovered (R) subgroups.
Several variations of basic models are considered. One commonly used version is the SEIR model. It is expressed in terms of four ordinary differential equations: \( \dot{S} = −\beta S I, \dot{E} = \beta SI − \alpha E, \dot{I} = \alpha E - \gamma I, \dot{R}=\gamma I \).  The contact rate between susceptible people and those infected is \( \beta \), \( 1/\alpha \) is the latent period, and \( 1/\gamma \) is the infectious period. The reproduction number \( R_{0} = \beta / \gamma \) quantifies how many new infections a single infected individual creates in an otherwise completely susceptible population. The herd immunity threshold is then \( H=1-\frac{1}{R_{0}} \).
The basic SEIR model is rather simplistic. It does not account for births and deaths. It is also static in the sense that the parameters \( \alpha\), \( \beta \), and \( \gamma \) are not time-dependent. It does not include any kind of spatial variation, and it treats a fixed population without births and deaths. The author also considers simpler models, and one slightly more complicated model (SEIIR) that incorporates the possibility for both symptomatic and asymptomatic infections.
The treatment of computational epidemiology largely focuses on linearization, discretization, and solution of the differential equations for the various models using both implicit and explicit time integration methods, and several examples are provided. Toward the end of this section the author introduces what she calls a ”dynamic SEIR model” whose distinguishing feature is a time-dependent contact rate \( \beta(t) \) with the form of a hyperbolic tangent.  This is intended, in part, to take into account the changes in social behavior and the effects of political action during a pandemic.
The last two parts of the book offer approaches that can overcome some of the weaknesses of the basic models. The first one is a network approach designed to deal with combined temporal and spatial variations of infection via network diffusion on a weighted undirected graph or using a finite element method. The intention is to provide a tool to model outbreak dynamics that occur because of transmission between geographical regions.  Data-driven epidemiology is one of the newest approaches. It integrates earlier computational epidemiology with a probabilistic approach that uses data with Bayesian analysis and machine learning methods to infer viral reproduction dynamics and to incorporate the effect of asymptomatic transmission with correlation of case data and mobility statistics.
A lot of questions remain. It is apparent that - at least early in the pandemic - we did not understand either its biology (asymptotic infection and the possibility of re-infection, for example) or its sociology (effects of public policy, availability of immunization, resistance to immunization and mask wearing, school closures, and travel restrictions, for example), and it may be difficult to unravel these effects to make better models. Many models do not attempt to model the number of hospitalizations, yet those contribute considerably to the burden on the healthcare systems. Further, data driven modeling depends essentially on reliable data, and that continues to be a serious challenge. The concept of herd immunity is widely discussed, but it is not clear at all how it might apply in our current complex environment.  
This is a very ambitious book. The author packs a lot of stuff into just over 300 pages. Exposition, data, and graphics compete for space. Some pages hold more than 25 small plots, often so small that individual details are very difficult to read. Apparently, the intention is to show just trends and patterns, but it makes for a very busy text. The index is not very complete, and internal references within and between chapters can be hard to track down.   
Every chapter has a collection of very good problems. Very few of these are straightforward, and many would require considerable work. The many examples using real data make the book a valuable resource.  
Overall the book presents a number of important ideas and offers some significant new approaches for modeling real and complicated epidemics. It’s a great place to begin to understand where mathematical epidemiology is now and where it has to go. But it also shows signs of being assembled very quickly, and consequently poses some extra challenges to the reader.

Bill Satzer (, now retired from 3M Company, spent most of his career as a mathematician working in industry on a variety of applications. He did his PhD work in dynamical systems and celestial mechanics.