On Explaining Unfairness: An Overview
Christos Fragkathoulas, Vasiliki Papanikou, Danae Pla Karidi, Evaggelia Pitoura
TL;DR
On Explaining Unfairness: An Overview addresses the interplay between algorithmic fairness and explainability by proposing taxonomies for both areas and detailing how explanations can illuminate, measure, and mitigate unfair outcomes. It categorizes explanations for fairness into three directions—enhancing fairness metrics, uncovering causes of (un)fairness, and guiding mitigation—and surveys counterfactuals, Shapley-based analyses, and extensions beyond classification across diverse domains. The work highlights the predominance of post-hoc, model-agnostic, and counterfactual approaches, while identifying gaps such as breadth across fairness notions and tasks and the need for multi-faceted, causality-informed explanations. Overall, the paper provides a consolidated framework to guide researchers and practitioners in developing more transparent, fair, and actionable AI systems across settings like recommender systems and graph-based applications.
Abstract
Algorithmic fairness and explainability are foundational elements for achieving responsible AI. In this paper, we focus on their interplay, a research area that is recently receiving increasing attention. To this end, we first present two comprehensive taxonomies, each representing one of the two complementary fields of study: fairness and explanations. Then, we categorize explanations for fairness into three types: (a) Explanations to enhance fairness metrics, (b) Explanations to help us understand the causes of (un)fairness, and (c) Explanations to assist us in designing methods for mitigating unfairness. Finally, based on our fairness and explanation taxonomies, we present undiscovered literature paths revealing gaps that can serve as valuable insights for future research.
