Table of Contents
Fetching ...

Trustworthy AI Must Account for Interactions

Jesse C. Cresswell

TL;DR

Trustworthy AI seeks alignment with human values across fairness, privacy, robustness, explainability, and uncertainty quantification, but existing work often optimizes a subset in isolation. The paper surveys negative pairwise interactions among these five facets and formalizes dynamics with notions such as group-wise accuracy disparity $\\Delta_{acc}$ and conformal prediction guarantees $\\Pr[y_{test} \\in \\mathcal{C}_q(x_{test})] \\ge 1-\\alpha$, arguing that simple overlay of individual solutions is insufficient. It proposes a holistic framework that accounts for all relevant axes, providing practitioner guidance (metrics, ablations, stakeholder involvement) and a financial-industry vignette to illustrate real-world risks. The work calls for interdisciplinary, context-aware approaches to manage trade-offs and build true multi-faceted trust in AI.

Abstract

Trustworthy AI encompasses many aspirational aspects for aligning AI systems with human values, including fairness, privacy, robustness, explainability, and uncertainty quantification. Ultimately the goal of Trustworthy AI research is to achieve all aspects simultaneously. However, efforts to enhance one aspect often introduce unintended trade-offs that negatively impact others. In this position paper, we review notable approaches to these five aspects and systematically consider every pair, detailing the negative interactions that can arise. For example, applying differential privacy to model training can amplify biases, undermining fairness. Drawing on these findings, we take the position that current research practices of improving one or two aspects in isolation are insufficient. Instead, research on Trustworthy AI must account for interactions between aspects and adopt a holistic view across all relevant axes at once. To illustrate our perspective, we provide guidance on how practitioners can work towards integrated trust, examples of how interactions affect the financial industry, and alternative views.

Trustworthy AI Must Account for Interactions

TL;DR

Trustworthy AI seeks alignment with human values across fairness, privacy, robustness, explainability, and uncertainty quantification, but existing work often optimizes a subset in isolation. The paper surveys negative pairwise interactions among these five facets and formalizes dynamics with notions such as group-wise accuracy disparity and conformal prediction guarantees , arguing that simple overlay of individual solutions is insufficient. It proposes a holistic framework that accounts for all relevant axes, providing practitioner guidance (metrics, ablations, stakeholder involvement) and a financial-industry vignette to illustrate real-world risks. The work calls for interdisciplinary, context-aware approaches to manage trade-offs and build true multi-faceted trust in AI.

Abstract

Trustworthy AI encompasses many aspirational aspects for aligning AI systems with human values, including fairness, privacy, robustness, explainability, and uncertainty quantification. Ultimately the goal of Trustworthy AI research is to achieve all aspects simultaneously. However, efforts to enhance one aspect often introduce unintended trade-offs that negatively impact others. In this position paper, we review notable approaches to these five aspects and systematically consider every pair, detailing the negative interactions that can arise. For example, applying differential privacy to model training can amplify biases, undermining fairness. Drawing on these findings, we take the position that current research practices of improving one or two aspects in isolation are insufficient. Instead, research on Trustworthy AI must account for interactions between aspects and adopt a holistic view across all relevant axes at once. To illustrate our perspective, we provide guidance on how practitioners can work towards integrated trust, examples of how interactions affect the financial industry, and alternative views.

Paper Structure

This paper contains 24 sections, 9 equations.