Evaluating Fairness in Transaction Fraud Models: Fairness Metrics, Bias Audits, and Challenges
Parameswaran Kamalaruban, Yulu Pi, Stuart Burrell, Eleanor Drage, Piotr Skalski, Jason Wong, David Sutton
TL;DR
This work addresses fairness in transaction fraud detection by performing the first algorithmic bias audit in this domain using public synthetic datasets. It presents a framework that categorizes and evaluates a wide range of group fairness metrics, including both threshold-dependent and threshold-independent measures, and examines the impact of normalization to account for severe class imbalance. Empirically, LightGBM models trained with standard ERM and fairness-through-unawareness reveal that protection-related metrics can be unbiased at a fixed FP rate, while quality-of-service metrics exhibit bias once normalized, with notable biases in high-precision regimes. The study highlights socio-technical challenges and advocates for cardholder-level fairness evaluation and nuanced metric choices that balance fraud protection with user experience, laying groundwork for future domain-specific fairness methods in transaction fraud systems.
Abstract
Ensuring fairness in transaction fraud detection models is vital due to the potential harms and legal implications of biased decision-making. Despite extensive research on algorithmic fairness, there is a notable gap in the study of bias in fraud detection models, mainly due to the field's unique challenges. These challenges include the need for fairness metrics that account for fraud data's imbalanced nature and the tradeoff between fraud protection and service quality. To address this gap, we present a comprehensive fairness evaluation of transaction fraud models using public synthetic datasets, marking the first algorithmic bias audit in this domain. Our findings reveal three critical insights: (1) Certain fairness metrics expose significant bias only after normalization, highlighting the impact of class imbalance. (2) Bias is significant in both service quality-related parity metrics and fraud protection-related parity metrics. (3) The fairness through unawareness approach, which involved removing sensitive attributes such as gender, does not improve bias mitigation within these datasets, likely due to the presence of correlated proxies. We also discuss socio-technical fairness-related challenges in transaction fraud models. These insights underscore the need for a nuanced approach to fairness in fraud detection, balancing protection and service quality, and moving beyond simple bias mitigation strategies. Future work must focus on refining fairness metrics and developing methods tailored to the unique complexities of the transaction fraud domain.
