Survey data yields improved estimates of test-confirmed COVID-19 cases when rapid at-home tests were massively distributed in the United States

Publication information:

Mauricio Santillana, Ata A. Uslu, Tamanna Urmi, Alexi Quintana-Mathe, James N. Druckman, Katherine Ognyanova, Roy H. Perlis, Matthew A. Baum, and David Lazer. 2024. “Survey Data Yields Improved Estimates of Test-Confirmed COVID-19 Cases When Rapid At-Home Tests Were Massively Distributed in the United States”. JAMA Network Open

Abstract

Importance: Identifying and tracking new infections during an emerging pandemic is crucial to

design and deploy interventions to protect populations and mitigate its effects, yet it remains a

challenging task.

Objective: To characterize the ability of non-probability online surveys to longitudinally

estimate the number of COVID-19 infections in the population both in the presence and

absence of institutionalized testing.

Design: Internet-based non-probability surveys were conducted, using the PureSpectrum

survey vendor, approximately every 6 weeks between April 2020 and January 2023. They

collected information on COVID-19 infections with representative state-level quotas applied to

balance age, gender, race and ethnicity, and geographic distribution. Data from this survey

were compared to institutional case counts collected by Johns Hopkins University and

wastewater surveillance data for SARS-CoV-2 from Biobot Analytics.

Setting: Population-based online non-probability survey conducted for a multi-university

consortium —the Covid States Project.

Participants: Residents of age 18+ across 50 US states and the District of Columbia in the

US.

Main Outcomes and Measures: The main outcomes are: (a) survey-weighted estimates of

new monthly confirmed COVID-19 cases in the US from January 2020 to January 2023, and

(b) estimates of uncounted test-confirmed cases, from February 1, 2022, to January 1, 2023.

These are compared to institutionally reported COVID-19 infections and wastewater viral

concentrations.

Results: The survey spanned 17 waves deployed from June 2020 to January 2023, with a

total of 408,515 responses from 306,799 respondents with mean age 42.8 (STD 13) years;

202,416 (66%) identified as women, and 104,383 (34%) as men. A total of 16,715 (5.4%)

identified as Asian, 33,234 (10.8%) as Black, 24,938 (8.1%) as Hispanic, 219,448 (71.5%) as

White, and 12,464 (4.1%) as another race. Overall, 64,946 respondents (15.9%) self-reported

a test-confirmed COVID-19 infection. National survey-weighted test-confirmed COVID-19

estimates were strongly correlated with institutionally reported COVID-19 infections (Pearson

correlation of r=0.96; p=1.8 e-12) from April 2020 to January 2022 (50-state correlation

average of r=0.88, SD = 0.073). This was before the government-led mass distribution of at-

home rapid tests. Following January 2022, correlation was diminished and no longer

statistically significant (r=0.55, p=0.08; 50-state correlation average of r=0.48, SD = 0.227). In

contrast, survey COVID-19 estimates correlated highly with SARS-CoV-2 viral concentrations

3The copyright holder for this preprint this version posted May 22, 2024. ; https://doi.org/10.1101/2024.05.21.24307697 doi: medRxiv preprint

(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

All rights reserved. No reuse allowed without permission.

in wastewater both before (r=0.92; p=2.2e-09) and after (r=0.89; p=2.3e-04) January 2022.

Institutionally reported COVID-19 cases correlated (r = 0.79, p=1.10e-05) with wastewater viral

concentrations before January 2022, but poorly (r = 0.31, p=0.35) after, suggesting both

survey and wastewater estimates may have better captured test-confirmed COVID-19

infections after January 2022. Consistent correlation patterns were observed at the state-level.

Based on national-level survey estimates, approximately 54 million COVID-19 cases were

unaccounted for in official records between January 2022 and January 2023.

Conclusions and Relevance:

Non-probability survey data can be used to estimate the temporal evolution of test-confirmed

infections during an emerging disease outbreak. Self-reporting tools may enable government

and healthcare officials to implement accessible and affordable at-home testing for efficient

infection monitoring in the future.