Behind the Paper

How much do COVID-19 case counts underestimate the size of the pandemic?

By our estimates, total infections in the U.S. early in the pandemic were 9 times higher than case counts suggest.

Published in Microbiology

Sep 09, 2020

Jade Benjamin-Chung and Sean L. Wu

2 contributors

How much do COVID-19 case counts underestimate the size of the pandemic?

Like Be the first to like this

Explore the Research

Early in the severe acute respiratory syndrome-coronavirus 2 (SARS-CoV-2) pandemic, scientists and the general public alike watched with growing alarm as the number of people who tested positive grew over time. Charts of daily positive test counts made it clear that we were likely experiencing the spread of a once-in-a-generation pandemic and that the public health implications would be drastic. However, as epidemiologists, we were far more concerned about what those charts did not show us.

Very few SARS-CoV-2 tests were available in the U.S. at the time, and the U.S. Centers for Disease Control and Prevention recommended that doctors primarily test people with moderate-to-severe coronavirus disease 2019 (COVID-19) symptoms. There was growing evidence that individuals could be infected and potentially spread the virus without showing symptoms. For these reasons, we suspected that the number of positive tests vastly underestimated the size of the pandemic — but by how much?

At the time, our group was working together on epidemiologic research focused on malaria eradication. While clearly very different, the epidemiology of malaria in eradication settings shares some important features with SARS-CoV-2 and other novel pathogens. For both, asymptomatic transmission is possible, and diagnostic tests miss many cases. We wanted to answer the question: To what extent do SARS-CoV-2 testing practices (e.g., testing primarily symptomatic people) and test accuracy influence whether an infected person gets tested and tests positive? As epidemiologists, we are trained to identify and minimize bias — the gap between the numbers we measure and the truth. If we corrected for this bias in the entire population, we could estimate the total number of infections and better understand the magnitude of the pandemic. We quickly assembled a team of students and scientists who we were already working with on studies of malaria as well as influenza and enteric pathogens to help us answer this question.

First, we examined the daily testing rates in different states and found substantial regional variation. Overall, by mid-April 2020, testing rates were highest in the northeast (e.g., 31 per 1,000 in Rhode Island) and lower in the Midwest and South (e.g., 6 per 1,000 in Kansas). The variation in state-level testing rates likely meant that the amount of bias in case counts would vary by state as well.

To see an interactive version of this plot, please visit https://covid19epi.github.io/stats/home/

We reviewed the available evidence about testing probabilities among people with and without COVID-19 symptoms and the accuracy of the tests used (how likely were false positives and false negatives?). Using this evidence and a tool called probabilistic bias analysis (Lash et al., 2011), we created a probabilistic model that corrected the number of confirmed COVID-19 cases for bias. We ran our model using daily data on the number of tests and confirmed cases in each state in the U.S. compiled by the COVID Tracking Project (https://covidtracking.com/). The model generated an estimate of the total number of SARS-CoV-2 infections we would expect if everyone was able to get tested with 100% accurate tests.

By mid-April 2020, there were 721,245 confirmed COVID-19 cases in the U.S., but our model estimated 6,454,951 total infections — a 9-fold higher number. Our results suggested that at that time, 1.9% of the U.S. population had been infected, whereas confirmed case totals would have implied only 0.2% were infected. The discrepancy was the largest in Puerto Rico, where the case counts underestimated total infections by a factor of 33. Our analysis suggested that incomplete testing was responsible for the majority of this gap, while the remainder was due to less than 100% test accuracy.

To see an interactive version of this plot, please visit https://covid19epi.github.io/stats/home/

In many ways, these results aren’t surprising. But for the public, the difference between hearing there are 100 new cases a day vs. 900 new cases a day may be enough to motivate better social distances practices. For policymakers, our findings underscore the many months-long urgent calls by scientists and physicians for more complete and equitable SARS-CoV-2 testing — including testing individuals without symptoms.

For more details, please see our paper. For interactive graphics, please visit https://covid19epi.github.io/stats/home/

Reference

Lash, T. L., Fox, M. P. & Fink, A. K. Applying Quantitative Bias Analysis to Epidemiologic Data. (Springer Science & Business Media, 2011).

Multiple Contributors

Jade Benjamin-Chung and Sean L. Wu

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Microbiology

Life Sciences > Biological Sciences > Microbiology

Nature Communications

Nature Communications

An open access, multidisciplinary journal dedicated to publishing high-quality research in all areas of the biological, health, physical, chemical and Earth sciences.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Women's Health

A selection of recent articles that highlight issues relevant to the treatment of neurological and psychiatric disorders in women.

Publishing Model: Hybrid

Deadline: Ongoing

Explore this Collection

Advances in neurodegenerative diseases

This Collection aims to bring together research from various domains related to neurodegenerative conditions, encompassing novel insights into disease pathophysiology, diagnostics, therapeutic developments, and care strategies. We welcome the submission of all papers relevant to advances in neurodegenerative disease.

Publishing Model: Hybrid

Deadline: Mar 24, 2026

Explore this Collection

Latest Content

Single cell histone modifications can be readily quantified in single cell proteomic datasets

Behind the Paper, Empower Your Research

TRANSITION ARCHITECTURE | A hope compass towards healthy, harmonious and thriving societies

Behind the Paper

Engineering Bipolar Doping in a Janus Dual‑Atom Catalyst for Photo‑Enhanced Rechargeable Zn‑Air Battery

Behind the Paper

Cationic Adsorption‑Induced Microlevelling Effect: A Pathway to Dendrite‑Free Zinc Anodes

Behind the Paper

Tunable Platform Capacity of Metal–Organic Frameworks via High‑Entropy Strategy for Ultra‑Fast Sodium Storage

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

How much do COVID-19 case counts underestimate the size of the pandemic?

Share this post

Share with...

...or copy the link