From Individual Experience to Collective Evidence: A Reporting-Based Framework for Identifying Systemic Harms
Jessica Dai, Paula Gradu, Inioluwa Deborah Raji, Benjamin Recht
TL;DR
This paper introduces a reporting-based framework to identify subgroups disproportionately harmed by a system using sequential analysis of incident reports. It formalizes the problem as sequential hypothesis tests over a set of subgroups by comparing reporting preponderances ${\mu_G}$ to base preponderances ${\mu_G^0}$ and interprets results through relative risk and true incidence rate under a probabilistic reporting model. Two algorithmic instantiations—the sequential Z-test (finite-sample and asymptotic versions) and a betting-style approach—provide anytime-valid guarantees with Bonferroni-type corrections and stopping-time bounds, demonstrated on real-world data such as VAERS myocarditis reports and HMDA mortgage denials. The framework supports post-deployment fairness auditing by leveraging user-reported incidents to rapidly uncover disparities while accounting for reporting heterogeneity. Collectively, the work highlights a practical, policy-aligned path for turning individual experiences into timely, group-level insights about systemic harms.
Abstract
When an individual reports a negative interaction with some system, how can their personal experience be contextualized within broader patterns of system behavior? We study the reporting database problem, where individual reports of adverse events arrive sequentially, and are aggregated over time. In this work, our goal is to identify whether there are subgroups--defined by any combination of relevant features--that are disproportionately likely to experience harmful interactions with the system. We formalize this problem as a sequential hypothesis test, and identify conditions on reporting behavior that are sufficient for making inferences about disparities in true rates of harm across subgroups. We show that algorithms for sequential hypothesis tests can be applied to this problem with a standard multiple testing correction. We then demonstrate our method on real-world datasets, including mortgage decisions and vaccine side effects; on each, our method (re-)identifies subgroups known to experience disproportionate harm using only a fraction of the data that was initially used to discover them.
