Skip to content

Filling in Missing Data Using Intuitive Logic: Deep-Sea Coral Edition

By Kelly Donovan

Initial interest

Throughout my life, I’ve always been interested in the environment, climate, weather, and the physical aspects of the Earth. I knew I wanted to pursue research that would let me dip my toes into a topic that was underrepresented, so when my research mentor provided me with a path that involved analyzing deep-sea corals, I couldn’t refuse.

Deep-sea corals and missing data

Figure 1: Photograph of a coral reef ecosystem.

Corals are stationary and fragile animals living in the ocean that serve as habitat for about 25% of fish species. Unlike the colorful shallow-water corals many people picture, deep-sea corals can live hundreds or even thousands of meters beneath the ocean’s surface. One special characteristic of corals is their slow growth rates, often just a few millimeters per year… keep this in mind, we’ll return to this later.

Unfortunately, deep-sea corals are underresearched compared to shallow-sea corals. A lot of this comes down to accessibility, which makes sense! The deeper you go into the ocean, the more challenging (and more expensive, and no one likes that) it becomes. This lack of data becomes an issue when things like global warming, oil drilling, and bottom-trawling start to negatively affect deep-sea corals. Without consistent data on corals, how can we help conserve them if we don’t even understand them on a basic level?

My backfill technique: Intuitive thinking

I used the National Oceanic and Atmospheric Administration’s (NOAA) deep-sea coral and sponge dataset to analyze deep-sea coral missing data, specifically focusing on the Northern Gulf of Mexico region between the years 1960 and 2022. I dialed in on analyzing corals Desmophyllum pertusum and Desmophyllum dianthus, as these corals are some of the more common and researched deep-sea corals.

Figure 2: Example of a D. pertusum coral. Photograph courtesy of NOAA-Pelagic Research Services.

Now, we have D. pertusum and D. dianthus corals, their location, and the year they were observed and recorded. The issue with missing data in NOAA’s dataset is that corals were not consistently tracked and recorded at the same location over time. For example, if a D. pertusum coral was recorded as being seen in 2010, it would be very common for that same coral at that same location to not be recorded in the dataset for any other year before or after its 2010 recording. We know this isn’t biologically possible… as I mentioned earlier, corals are stationary and slow-growing animals. If a deep-sea coral was seen and recorded at a species level, it almost certainly had been there for several decades, if not centuries! A D. pertusum coral can’t just ‘magically’ appear in 2010 just because a scientist recorded it. Therefore, it was this research’s goal to backfill that coral using its average size and growth rate so that there are artificial values representing that the coral has been there for longer than the dataset originally recorded.

This backfill technique is surprisingly intuitive. If we take a coral species’ average size and divide it by its annual growth rate, we come up with an estimate of years that coral has been around for, before it was recorded. I implement this information into my code and voilà, we now have less missing data in our dataset!

For example, we found that D. pertusum corals had an average size of 175.5 mm and an average growth rate of 12.82 mm/year. The division results in a backfill amount of approximately 10 years, as we round down to the nearest integer year for conservative estimation. Therefore, if a D. pertusum coral was recorded in 2010 with no other recordings of that same coral in the dataset, the created backfill code would identify the year in which an original coral was observed (2010) along with the genus and species of the coral (D. pertusum), and take the species’ backfill amount that is input by the user to artificially input observation values for all years up to that value (in our case, going back through year 2000).

Result of the backfill

After implementing our backfill approach on the original D. pertusum and D. dianthus observations, we found a 25.4% increase of individual coral counts. This increase was so exciting to see, as we saw just how much our technique could better represent deep-sea corals! But wait, there’s more. Post-backfill method, all years between 1960 and 2022 now show Desmophyllum coral presence (Figure 1). This makes sense and is what we’d expect to see: corals have really been present this entire time, they just weren’t recorded every year. Corals don’t simply disappear and reappear every couple of years, like the pre-backfilled row shows based on original NOAA observations (Figure 1).

Figure 3: Heat-tile map visualization of pre- and post-backfill technique on D. pertusum and D. dianthus

The final goal

My current goal is to determine average sizes and growth rates for all species within NOAA’s dataset for the Northern Gulf of Mexico ecoregion and to backfill values for all species so scientists can get to work on deep-sea coral specific conservation tactics. The better the data we have, the better we can understand and protect corals throughout the world. Even though this project focuses mostly on statistics, data, and coding, the true motivation behind my work comes from contributing work that, even in a small way, could lead to better deep-sea coral conservation.


Kelly Donovan is an undergraduate student studying Statistics and Applied Mathematics at Elon University under the supervision of Dr. Nicholas Bussberg. After graduating, she plans to pursue graduate school at Wake Forest University where she can dive deeper into environmental and natural hazard statistics.

You can follow Kelly Donovan on:

Linkedin: Kelly Donovan's Linkedin

Instagram: kellydonovan_03