Computationally Assisted Quality Control for Public Health Data Streams

Ananya Joshi; Kathryn Mazaitis; Roni Rosenfeld; Bryan Wilder

Computationally Assisted Quality Control for Public Health Data Streams

Ananya Joshi, Kathryn Mazaitis, Roni Rosenfeld, Bryan Wilder

TL;DR

This paper addresses irregularities in real-time public health data streams by introducing FlaSH, a scalable, model-based outlier detection framework designed to rank data points for expert review. FlaSH combines regime-aware data processing, simple predictive modeling, and a binomial-based discrepancy test to produce $p$-value–driven outlier scores that are pooled across regions. In expert evaluations and a real deployment in Delphi, FlaSH matched or outperformed deep learning baselines on traditional metrics and consistently highlighted irregularities that humans would otherwise miss, validating its practical utility. The approach offers a deployable, interpretable, and efficient solution for computationally assisted quality control in public health, with broad implications for timely, data-driven decision-making.

Abstract

Irregularities in public health data streams (like COVID-19 Cases) hamper data-driven decision-making for public health stakeholders. A real-time, computer-generated list of the most important, outlying data points from thousands of daily-updated public health data streams could assist an expert reviewer in identifying these irregularities. However, existing outlier detection frameworks perform poorly on this task because they do not account for the data volume or for the statistical properties of public health streams. Accordingly, we developed FlaSH (Flagging Streams in public Health), a practical outlier detection framework for public health data users that uses simple, scalable models to capture these statistical properties explicitly. In an experiment where human experts evaluate FlaSH and existing methods (including deep learning approaches), FlaSH scales to the data volume of this task, matches or exceeds these other methods in mean accuracy, and identifies the outlier points that users empirically rate as more helpful. Based on these results, FlaSH has been deployed on data streams used by public health stakeholders.

Computationally Assisted Quality Control for Public Health Data Streams

TL;DR

-value–driven outlier scores that are pooled across regions. In expert evaluations and a real deployment in Delphi, FlaSH matched or outperformed deep learning baselines on traditional metrics and consistently highlighted irregularities that humans would otherwise miss, validating its practical utility. The approach offers a deployable, interpretable, and efficient solution for computationally assisted quality control in public health, with broad implications for timely, data-driven decision-making.

Abstract

Paper Structure (26 sections, 1 equation, 3 figures, 1 table)

This paper contains 26 sections, 1 equation, 3 figures, 1 table.

Motivation and Introduction
Practical Irregularity Detection Goals
Out of Range Values and Global Outliers.
Day of Week Outliers.
Trendline Outliers.
FlaSH Outlier Detection Method
S1: Process Data.
S2: Obtain Predicted Values.
S3: Compare Predicted and Observed Values.
Process Data
Identifying Changepoints in Nonstationary Streams
Identifying Outliers Within Regimes
Obtain Predicted Values
Compare Predicted and Observed Values
FlaSH Output
...and 11 more sections

Figures (3)

Figure 1: Temporal irregularities in actual case counts, shown by the large spikes in March and July 2022, when cases were trending down, resulted in similar spikes for predicted counts (highlighted in red) that were then sent to the US Centers for Disease Control and Surveillance.
Figure 2: In the FlaSH outlier detection method, data stream inputs are processed through FlaSH to generate informational outlier scores. FlaSH itself has three steps. The raw data (gray) is processed [S1] (purple), and model $m$ is used to predict future values [S2] (blue). Then, the historical performance of model $m$ is captured with the test statistic distribution (gold), and this distribution is used to compare predicted and actual values [S3].
Figure 3: Example of a Survey Task. Respondents click on the time series plot to mark points as unevaluated, uninteresting, or warrants investigation. They also rank points that warrant investigation, and these rankings appear on the plot in yellow. Respondents could zoom, pan, and see a 7 day average per graph.

Computationally Assisted Quality Control for Public Health Data Streams

TL;DR

Abstract

Computationally Assisted Quality Control for Public Health Data Streams

Authors

TL;DR

Abstract

Table of Contents

Figures (3)