Measuring Fairness in Financial Transaction Machine Learning Models

Deniz Sezin Ayvaz; Lorenzo Belenguer; Hankun He; Deborah Dormah Kanubala; Mingxu Li; Soung Low; Carlos Mougan; Faithful Chiagoziem Onwuegbuche; Yulu Pi; Natalia Sikora; Dan Tran; Shresth Verma; Hanzhi Wang; Skyler Xie; Adeline Pelletier

Measuring Fairness in Financial Transaction Machine Learning Models

Deniz Sezin Ayvaz, Lorenzo Belenguer, Hankun He, Deborah Dormah Kanubala, Mingxu Li, Soung Low, Carlos Mougan, Faithful Chiagoziem Onwuegbuche, Yulu Pi, Natalia Sikora, Dan Tran, Shresth Verma, Hanzhi Wang, Skyler Xie, Adeline Pelletier

TL;DR

This work tackles fairness in Mastercard-like financial transaction ML by formalizing intersectional, multi-label fairness with a weighted fairness tensor and evaluating proxy discrimination, baseline and advanced predictive models, and multiple mitigation strategies. It introduces a mathematical framework where a fairness tensor $G$ (and its weighted variant $G^W$) captures disparities across protected attributes and industry labels, enabling targeted bias intervention through in-processing (e.g., exponentiated gradient, adversarial debiasing) and post-processing methods. Empirical analyses on synthetic data reveal proxy discrimination despite unawareness, with XGBoost achieving stronger predictive performance but persistent fairness gaps across gender, age, and ethnicity subgroups; results also show trade-offs between fairness and accuracy and potential benefits from including sensitive attributes under careful governance. The study emphasizes procedural monitoring, stakeholder alignment, and future research in multi-dimensional fairness, highlighting both the feasibility and the ongoing challenges of deploying fair ML in financial contexts.

Abstract

Mastercard, a global leader in financial services, develops and deploys machine learning models aimed at optimizing card usage and preventing attrition through advanced predictive models. These models use aggregated and anonymized card usage patterns, including cross-border transactions and industry-specific spending, to tailor bank offerings and maximize revenue opportunities. Mastercard has established an AI Governance program, based on its Data and Tech Responsibility Principles, to evaluate any built and bought AI for efficacy, fairness, and transparency. As part of this effort, Mastercard has sought expertise from the Turing Institute through a Data Study Group to better assess fairness in more complex AI/ML models. The Data Study Group challenge lies in defining, measuring, and mitigating fairness in these predictions, which can be complex due to the various interpretations of fairness, gaps in the research literature, and ML-operations challenges.

Measuring Fairness in Financial Transaction Machine Learning Models

TL;DR

(and its weighted variant

) captures disparities across protected attributes and industry labels, enabling targeted bias intervention through in-processing (e.g., exponentiated gradient, adversarial debiasing) and post-processing methods. Empirical analyses on synthetic data reveal proxy discrimination despite unawareness, with XGBoost achieving stronger predictive performance but persistent fairness gaps across gender, age, and ethnicity subgroups; results also show trade-offs between fairness and accuracy and potential benefits from including sensitive attributes under careful governance. The study emphasizes procedural monitoring, stakeholder alignment, and future research in multi-dimensional fairness, highlighting both the feasibility and the ongoing challenges of deploying fair ML in financial contexts.

Abstract

Paper Structure (61 sections, 21 equations, 27 figures, 8 tables)

This paper contains 61 sections, 21 equations, 27 figures, 8 tables.

Executive Summary
Challenge and Objectives
Overview of the Data
Structure and Content
Key Findings
Recommendations and Future Work
Conclusion
Disclaimer
Background: Fairness in Financial Transactions
Financial Transactions Models at Mastercard
Understanding Bias in Machine Learning Models
Potential Harms in Machine Learning Models
Existing Metrics for Measuring Fairness
Bias Mitigation Strategies
Data overview
...and 46 more sections

Figures (27)

Figure 1: Labels imbalance in adoption dataset. Labels 2 and 8 have disproportionately more positive instances.
Figure 2: Heterogeneous disparities within the label for ethnicity in adoption dataset. Each value represents how much each subgroup's mean differs from the unconditional mean within a given label.
Figure 3: Features correlation in spending data
Figure 4: Visualization of the tensor that formalizes the problem of intersectional fairness in multilabel problem $G_{l,k_1,k_2} = g(X, \mathbf{Y}_{k_1}, A_l) - g(X, \mathbf{Y}_{k_2}, A_l)$. The scale represents any fairness metric $g\in\{0,1\}$
Figure 5: SHAP Beeswarm values for XGBoost model trained on features 1-20 and labels predicting age
...and 22 more figures

Measuring Fairness in Financial Transaction Machine Learning Models

TL;DR

Abstract

Measuring Fairness in Financial Transaction Machine Learning Models

Authors

TL;DR

Abstract

Table of Contents

Figures (27)