Auditing the Fairness of the US COVID-19 Forecast Hub's Case Prediction Models

Saad Mohammad Abrar; Naman Awasthi; Daniel Smolyak; Vanessa Frias-Martinez

Auditing the Fairness of the US COVID-19 Forecast Hub's Case Prediction Models

Saad Mohammad Abrar, Naman Awasthi, Daniel Smolyak, Vanessa Frias-Martinez

TL;DR

This paper evaluates fairness of county-level forecasts from the US COVID-19 Forecast Hub across race/ethnicity and urbanization, identifying substantial disparities in predictive errors for minority groups and rural counties. It adopts a regression-based framework that computes the forecast error via the pinball loss $PBL$ and fits a Gaussian GLM with a log link, employing 1% trimming of extreme values and GVIF-guided variable selection. It then analyzes interactions with Lookahead, Phase, Model Type, and Mobility to reveal context-dependent fairness, reporting that Hispanic counties incur higher errors relative to White baselines, Asian counties exhibit lower errors, and urbanicity-related disparities persist in less urban areas. Mobility data usage generally reduces disparities, deep-learning and ensemble models show more balanced fairness, and the authors provide an interactive dashboard and fairness nutritional cards to aid decision-makers and advocate reporting fairness metrics alongside accuracy.

Abstract

The US COVID-19 Forecast Hub, a repository of COVID-19 forecasts from over 50 independent research groups, is used by the Centers for Disease Control and Prevention (CDC) for their official COVID-19 communications. As such, the Forecast Hub is a critical centralized resource to promote transparent decision making. While the Forecast Hub has provided valuable predictions focused on accuracy, there is an opportunity to evaluate model performance across social determinants such as race and urbanization level that have been known to play a role in the COVID-19 pandemic. In this paper, we carry out a comprehensive fairness analysis of the Forecast Hub model predictions and we show statistically significant diverse predictive performance across social determinants, with minority racial and ethnic groups as well as less urbanized areas often associated with higher prediction errors. We hope this work will encourage COVID-19 modelers and the CDC to report fairness metrics together with accuracy, and to reflect on the potential harms of the models on specific social groups and contexts.

Auditing the Fairness of the US COVID-19 Forecast Hub's Case Prediction Models

TL;DR

and fits a Gaussian GLM with a log link, employing 1% trimming of extreme values and GVIF-guided variable selection. It then analyzes interactions with Lookahead, Phase, Model Type, and Mobility to reveal context-dependent fairness, reporting that Hispanic counties incur higher errors relative to White baselines, Asian counties exhibit lower errors, and urbanicity-related disparities persist in less urban areas. Mobility data usage generally reduces disparities, deep-learning and ensemble models show more balanced fairness, and the authors provide an interactive dashboard and fairness nutritional cards to aid decision-makers and advocate reporting fairness metrics alongside accuracy.

Abstract

Paper Structure (1 section, 4 equations, 3 figures, 12 tables)

This paper contains 1 section, 4 equations, 3 figures, 12 tables.

S1 Appendix

Figures (3)

Figure 1: Distribution of the sensitive attributes (Race/Ethnicity and Urbanization level) across the 3,067 counties in US considered for this study.
Figure 3: Forecast Hub Fairness Dashboard showing the Average Error Ratio (AER) distribution across different COVID-19 prediction models, organized by model type. Within each model type, teams are sorted in ascending order based on their median AER values. Since the user has selected "Only Race" as the variable of interest (see bottom left box) and "Hispanic" as the protected variable (see bottom center box), the AER values compare prediction errors between Hispanic and White counties, where values above 1.0 indicate higher prediction errors for Hispanic counties. Box plots show the distribution of AER values across all predictions, with the center line representing the median, boxes showing the interquartile range, and whiskers extending to the minimum and maximum values.
Figure 4: Model fairness card displaying key performance metrics including model information, prediction error differences between protected and unprotected groups, AER values, and coverage statistics.

Auditing the Fairness of the US COVID-19 Forecast Hub's Case Prediction Models

TL;DR

Abstract

Auditing the Fairness of the US COVID-19 Forecast Hub's Case Prediction Models

Authors

TL;DR

Abstract

Table of Contents

Figures (3)