Direct Preference Optimization for Adaptive Concept-based Explanations

Jacopo Teneggi; Zhenzhen Wang; Paul H. Yi; Tianmin Shu; Jeremias Sulam

Direct Preference Optimization for Adaptive Concept-based Explanations

Jacopo Teneggi, Zhenzhen Wang, Paul H. Yi, Tianmin Shu, Jeremias Sulam

TL;DR

The paper addresses the gap in explanation methods by incorporating listener context through pragmatic reasoning. It combines Rational Speech Act with Direct Preference Optimization to train a speaker that generates concept-based explanations tailored to different listeners using only pairwise preferences. Across CUB, ImageNet, and CheXpert, pragmatic explanations improve listener accuracy and align with the base classifier, with user studies suggesting real-world communicative benefits. The work demonstrates that preference-driven, adaptive explanations can enhance transparency and decision-support in diverse domains, while highlighting trade-offs in computation and safety considerations for high-stakes use cases.

Abstract

Concept-based explanation methods aim at making machine learning models more transparent by finding the most important semantic features of an input (e.g., colors, patterns, shapes) for a given prediction task. However, these methods generally ignore the communicative context of explanations, such as the preferences of a listener. For example, medical doctors understand explanations in terms of clinical markers, but patients may not, needing a different vocabulary to rationalize the same diagnosis. We address this gap with listener-adaptive explanations grounded in principles of pragmatic reasoning and the rational speech act. We introduce an iterative training procedure based on direct preference optimization where a speaker learns to compose explanations that maximize communicative utility for a listener. Our approach only needs access to pairwise preferences, which can be collected from human feedback, making it particularly relevant in real-world scenarios where a model of the listener may not be available. We demonstrate that our method is able to align speakers with the preferences of simulated listeners on image classification across three datasets, and further validate that pragmatic explanations generated with our method improve the classification accuracy of participants in a user study.

Direct Preference Optimization for Adaptive Concept-based Explanations

TL;DR

Abstract

Direct Preference Optimization for Adaptive Concept-based Explanations

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (23)