Stream-Based Monitoring of Algorithmic Fairness
Jan Baumeister, Bernd Finkbeiner, Frederik Scheerer, Julian Siber, Tobias Wagenpfeil
TL;DR
The paper addresses the challenge of verifying algorithmic fairness in deployed decision and prediction systems where static verification is often infeasible due to complexity and data volume. It proposes stream-based runtime monitoring using RTLola to formalize and check temporal fairness properties, extracting independent trials from a single execution and estimating conditional probabilities with time-aware monitors. To stabilize early decisions, the authors employ MAP priors and demonstrate their approach on synthetic benchmarks and real-world COMPAS data, showing that RTLola-based monitoring can detect unfairness early and scale to high-throughput settings. The work offers a practical framework for real-time auditing of fairness that complements, and in some cases surpasses, traditional database-based monitoring approaches in speed and expressiveness.
Abstract
Automatic decision and prediction systems are increasingly deployed in applications where they significantly impact the livelihood of people, such as for predicting the creditworthiness of loan applicants or the recidivism risk of defendants. These applications have given rise to a new class of algorithmic-fairness specifications that require the systems to decide and predict without bias against social groups. Verifying these specifications statically is often out of reach for realistic systems, since the systems may, e.g., employ complex learning components, and reason over a large input space. In this paper, we therefore propose stream-based monitoring as a solution for verifying the algorithmic fairness of decision and prediction systems at runtime. Concretely, we present a principled way to formalize algorithmic fairness over temporal data streams in the specification language RTLola and demonstrate the efficacy of this approach on a number of benchmarks. Besides synthetic scenarios that particularly highlight its efficiency on streams with a scaling amount of data, we notably evaluate the monitor on real-world data from the recidivism prediction tool COMPAS.
