The Impact of Explanations on Fairness in Human-AI Decision-Making: Protected vs Proxy Features
Navita Goyal, Connor Baumler, Tin Nguyen, Hal Daumé
TL;DR
The paper addresses how explanations and disclosures influence fairness in human-AI decision-making when biases arise directly from protected attributes or indirectly via proxies. It uses a micro-lending task with biased logistic regression models and manipulates explanations and two disclosure types across six conditions to measure fairness perception, demographic parity, and decision quality. Key findings show explanations aid detection of direct biases but can increase acceptance of biased decisions, while disclosures (especially about bias and proxy correlations) help recognize and mitigate indirect biases; however, the joint intervention often fails to consistently improve fairness, highlighting that explanations are not a universal solution. Practically, the work informs when to deploy explanations and disclosures to support fair human-AI collaboration and underscores the need for careful design to avoid over-reliance on biased AI systems.
Abstract
AI systems have been known to amplify biases in real-world data. Explanations may help human-AI teams address these biases for fairer decision-making. Typically, explanations focus on salient input features. If a model is biased against some protected group, explanations may include features that demonstrate this bias, but when biases are realized through proxy features, the relationship between this proxy feature and the protected one may be less clear to a human. In this work, we study the effect of the presence of protected and proxy features on participants' perception of model fairness and their ability to improve demographic parity over an AI alone. Further, we examine how different treatments -- explanations, model bias disclosure and proxy correlation disclosure -- affect fairness perception and parity. We find that explanations help people detect direct but not indirect biases. Additionally, regardless of bias type, explanations tend to increase agreement with model biases. Disclosures can help mitigate this effect for indirect biases, improving both unfairness recognition and decision-making fairness. We hope that our findings can help guide further research into advancing explanations in support of fair human-AI decision-making.
