Table of Contents
Fetching ...

Direct Preference Optimization for Adaptive Concept-based Explanations

Jacopo Teneggi, Zhenzhen Wang, Paul H. Yi, Tianmin Shu, Jeremias Sulam

TL;DR

The paper addresses the gap in explanation methods by incorporating listener context through pragmatic reasoning. It combines Rational Speech Act with Direct Preference Optimization to train a speaker that generates concept-based explanations tailored to different listeners using only pairwise preferences. Across CUB, ImageNet, and CheXpert, pragmatic explanations improve listener accuracy and align with the base classifier, with user studies suggesting real-world communicative benefits. The work demonstrates that preference-driven, adaptive explanations can enhance transparency and decision-support in diverse domains, while highlighting trade-offs in computation and safety considerations for high-stakes use cases.

Abstract

Concept-based explanation methods aim at making machine learning models more transparent by finding the most important semantic features of an input (e.g., colors, patterns, shapes) for a given prediction task. However, these methods generally ignore the communicative context of explanations, such as the preferences of a listener. For example, medical doctors understand explanations in terms of clinical markers, but patients may not, needing a different vocabulary to rationalize the same diagnosis. We address this gap with listener-adaptive explanations grounded in principles of pragmatic reasoning and the rational speech act. We introduce an iterative training procedure based on direct preference optimization where a speaker learns to compose explanations that maximize communicative utility for a listener. Our approach only needs access to pairwise preferences, which can be collected from human feedback, making it particularly relevant in real-world scenarios where a model of the listener may not be available. We demonstrate that our method is able to align speakers with the preferences of simulated listeners on image classification across three datasets, and further validate that pragmatic explanations generated with our method improve the classification accuracy of participants in a user study.

Direct Preference Optimization for Adaptive Concept-based Explanations

TL;DR

The paper addresses the gap in explanation methods by incorporating listener context through pragmatic reasoning. It combines Rational Speech Act with Direct Preference Optimization to train a speaker that generates concept-based explanations tailored to different listeners using only pairwise preferences. Across CUB, ImageNet, and CheXpert, pragmatic explanations improve listener accuracy and align with the base classifier, with user studies suggesting real-world communicative benefits. The work demonstrates that preference-driven, adaptive explanations can enhance transparency and decision-support in diverse domains, while highlighting trade-offs in computation and safety considerations for high-stakes use cases.

Abstract

Concept-based explanation methods aim at making machine learning models more transparent by finding the most important semantic features of an input (e.g., colors, patterns, shapes) for a given prediction task. However, these methods generally ignore the communicative context of explanations, such as the preferences of a listener. For example, medical doctors understand explanations in terms of clinical markers, but patients may not, needing a different vocabulary to rationalize the same diagnosis. We address this gap with listener-adaptive explanations grounded in principles of pragmatic reasoning and the rational speech act. We introduce an iterative training procedure based on direct preference optimization where a speaker learns to compose explanations that maximize communicative utility for a listener. Our approach only needs access to pairwise preferences, which can be collected from human feedback, making it particularly relevant in real-world scenarios where a model of the listener may not be available. We demonstrate that our method is able to align speakers with the preferences of simulated listeners on image classification across three datasets, and further validate that pragmatic explanations generated with our method improve the classification accuracy of participants in a user study.

Paper Structure

This paper contains 31 sections, 12 equations, 23 figures, 3 tables, 1 algorithm.

Figures (23)

  • Figure 1: Illustration of our listener-adaptive explanation framework: a speaker generates utterances to help a listener infer model predictions without access to the input image.
  • Figure 2: Example utterances generated with pragmatic speakers on all datasets. We include the input image, the utterance with the predicted and ground-truth labels in parentheses, and the average attention weights of the listener model across all layers. The first panel shows examples where the base classifier was correct, and the pragmatic listener inferred the predicted label. The second panel shows examples where the base classifier was wrong, and the pragmatic listener inferred the wrong predicted label. We include an example image of the confounding class to compare with.
  • Figure 3: Correlation between the class-wise accuracy of the base classifier and of listener models. Results are shown as a function of utterance length and minimal base classifier accuracy.
  • Figure 4: Utterance adaptation results on the CheXpert and CUB datasets. Solid lines report the normalized KL divergence between the empirical distribution of groups of claims in the utterances generated with a pragmatic speaker and the listeners' priors. Dashed lines report listener accuracy. Results are shown as a function of the temperature scale $\tau$.
  • Figure 5: Example utterance generated with pragmatic speakers with no adaptation and adaptation on the CheXpert (top row) and CUB (bottom row) datasets. We include examples where the base classifier and all listeners are correct in their respective predictions. For CheXpert, we include utterances generated with a pragmatic speaker with no adaptation on the augmented vocabulary of medical and layman's terms, and with pragmatic speakers adapted to doctor and patient listeners, respectively. For CUB, we include utterances generated with a pragmatic speaker with no adaptation, and with a pragmatic speaker adapted to a listener with a topic prior that excludes claims about bill, tail, and shape features.
  • ...and 18 more figures