Table of Contents
Fetching ...

From Individual Experience to Collective Evidence: A Reporting-Based Framework for Identifying Systemic Harms

Jessica Dai, Paula Gradu, Inioluwa Deborah Raji, Benjamin Recht

TL;DR

This paper introduces a reporting-based framework to identify subgroups disproportionately harmed by a system using sequential analysis of incident reports. It formalizes the problem as sequential hypothesis tests over a set of subgroups by comparing reporting preponderances ${\mu_G}$ to base preponderances ${\mu_G^0}$ and interprets results through relative risk and true incidence rate under a probabilistic reporting model. Two algorithmic instantiations—the sequential Z-test (finite-sample and asymptotic versions) and a betting-style approach—provide anytime-valid guarantees with Bonferroni-type corrections and stopping-time bounds, demonstrated on real-world data such as VAERS myocarditis reports and HMDA mortgage denials. The framework supports post-deployment fairness auditing by leveraging user-reported incidents to rapidly uncover disparities while accounting for reporting heterogeneity. Collectively, the work highlights a practical, policy-aligned path for turning individual experiences into timely, group-level insights about systemic harms.

Abstract

When an individual reports a negative interaction with some system, how can their personal experience be contextualized within broader patterns of system behavior? We study the reporting database problem, where individual reports of adverse events arrive sequentially, and are aggregated over time. In this work, our goal is to identify whether there are subgroups--defined by any combination of relevant features--that are disproportionately likely to experience harmful interactions with the system. We formalize this problem as a sequential hypothesis test, and identify conditions on reporting behavior that are sufficient for making inferences about disparities in true rates of harm across subgroups. We show that algorithms for sequential hypothesis tests can be applied to this problem with a standard multiple testing correction. We then demonstrate our method on real-world datasets, including mortgage decisions and vaccine side effects; on each, our method (re-)identifies subgroups known to experience disproportionate harm using only a fraction of the data that was initially used to discover them.

From Individual Experience to Collective Evidence: A Reporting-Based Framework for Identifying Systemic Harms

TL;DR

This paper introduces a reporting-based framework to identify subgroups disproportionately harmed by a system using sequential analysis of incident reports. It formalizes the problem as sequential hypothesis tests over a set of subgroups by comparing reporting preponderances to base preponderances and interprets results through relative risk and true incidence rate under a probabilistic reporting model. Two algorithmic instantiations—the sequential Z-test (finite-sample and asymptotic versions) and a betting-style approach—provide anytime-valid guarantees with Bonferroni-type corrections and stopping-time bounds, demonstrated on real-world data such as VAERS myocarditis reports and HMDA mortgage denials. The framework supports post-deployment fairness auditing by leveraging user-reported incidents to rapidly uncover disparities while accounting for reporting heterogeneity. Collectively, the work highlights a practical, policy-aligned path for turning individual experiences into timely, group-level insights about systemic harms.

Abstract

When an individual reports a negative interaction with some system, how can their personal experience be contextualized within broader patterns of system behavior? We study the reporting database problem, where individual reports of adverse events arrive sequentially, and are aggregated over time. In this work, our goal is to identify whether there are subgroups--defined by any combination of relevant features--that are disproportionately likely to experience harmful interactions with the system. We formalize this problem as a sequential hypothesis test, and identify conditions on reporting behavior that are sufficient for making inferences about disparities in true rates of harm across subgroups. We show that algorithms for sequential hypothesis tests can be applied to this problem with a standard multiple testing correction. We then demonstrate our method on real-world datasets, including mortgage decisions and vaccine side effects; on each, our method (re-)identifies subgroups known to experience disproportionate harm using only a fraction of the data that was initially used to discover them.

Paper Structure

This paper contains 47 sections, 12 theorems, 36 equations, 4 figures, 2 tables.

Key Result

Proposition 3.1

Define the relative risk of group $G$ to be $\mathrm{RR}_G := \frac{\Pr[Y = 1 \mid G]}{\Pr[Y = 1]}$. Suppose that for some group $G$ we have $\rho_G \leq b \cdot \rho$. Suppose that we determine that $\mu_G \geq \beta{\mu_G^0}$ for some $\beta > 1$. Then, the true relative risk experienced by $G$ is

Figures (4)

  • Figure 1: Overview of reporting database framework.
  • Figure 2: General protocol for testing overrepresentation
  • Figure 3: Number of reports ($t$) it takes for each algorithm to reject the null hypothesis for any group (i.e. first identification of harm), over 100 random permutations of COVID-19 vaccine report database. Tests are run with $\beta=2$. Each point on the plot reflects the number of trials (out of 100) in which a rejection has occured by time $t$.
  • Figure 4: Impact of multiple hypothesis correction on stopping time across algorithms. As in Figure \ref{['fig:covid-all-beta2']}, each point on the plot reflects the number of trials (out of 100) in which a rejection has occurred by time $t$. In all plots, the lighter, dashed line reflects stopping time of the invalid test that does not correct for multiple testing; the dark, solid line reflects stopping time of the valid test including a Bonferroni correction.

Theorems & Definitions (25)

  • Definition 3.1: Report-to-incidence ratio
  • Proposition 3.1
  • proof
  • Definition 3.2: Reporting rates
  • Proposition 3.2
  • proof : Proof of Proposition \ref{['prop:reporting-conversion']}
  • Theorem 4.1: Validity
  • Theorem 4.2: Power
  • Theorem 4.3: Validity
  • Theorem 4.4: Power
  • ...and 15 more