A Causal Framework to Measure and Mitigate Non-binary Treatment Discrimination
Ayan Majumdar, Deborah D. Kanubala, Kavya Gupta, Isabel Valera
TL;DR
The paper addresses the gap that fairness analyses in algorithmic decision-making often treat decisions as binary and overlook non-binary, controllable treatment decisions that influence outcomes. It introduces a causal framework that separates decision-subject covariates $X$ from treatment decisions $Z$, enabling measurement of treatment disparities via total and direct effects (TTD/DTD) and their downstream impact (TTD-E/DTD-E) using counterfactual reasoning. The approach leverages causal normalizing flows (CNF) to estimate counterfactual treatments and outcomes, including path-specific effects, while outlining assumptions and practical considerations. It further proposes a data-driven preprocessing mitigation to create treatment-fair datasets and fair risk scores, and validates the framework on four lending datasets, revealing disparities not remedied by standard predictive fairness and demonstrating potential gains from treatment-level mitigation. Overall, the work highlights the necessity of incorporating non-binary treatment decisions into fairness analyses to better align automated decision-making with multiple stakeholder utilities and societal values.
Abstract
Fairness studies of algorithmic decision-making systems often simplify complex decision processes, such as bail or loan approvals, into binary classification tasks. However, these approaches overlook that such decisions are not inherently binary (e.g., approve or not approve bail or loan); they also involve non-binary treatment decisions (e.g., bail conditions or loan terms) that can influence the downstream outcomes (e.g., loan repayment or reoffending). In this paper, we argue that non-binary treatment decisions are integral to the decision process and controlled by decision-makers and, therefore, should be central to fairness analyses in algorithmic decision-making. We propose a causal framework that extends fairness analyses and explicitly distinguishes between decision-subjects' covariates and the treatment decisions. This specification allows decision-makers to use our framework to (i) measure treatment disparity and its downstream effects in historical data and, using counterfactual reasoning, (ii) mitigate the impact of past unfair treatment decisions when automating decision-making. We use our framework to empirically analyze four widely used loan approval datasets to reveal potential disparity in non-binary treatment decisions and their discriminatory impact on outcomes, highlighting the need to incorporate treatment decisions in fairness assessments. Moreover, by intervening in treatment decisions, we show that our framework effectively mitigates treatment discrimination from historical data to ensure fair risk score estimation and (non-binary) decision-making processes that benefit all stakeholders.
