Table of Contents
Fetching ...

Measuring User Understanding in Dialogue-based XAI Systems

Dimitry Mindlin, Amelie Sophie Robrecht, Michael Morasch, Philipp Cimiano

TL;DR

This work addresses the challenge of objectively measuring user understanding in dialogue-based XAI by introducing a controlled three-phase experimental framework that pits an interactive dialogue interface against a static explanation baseline. Participants interact with a local-explanation–driven system on an income-prediction task, and understanding is quantified via simulation tasks using scores $U_{intuition}$, $U_{time}$, and $U_{model}$, analyzed with one-parameter IRT and nonparametric tests. Results show a significant improvement in objective model understanding for the interactive condition ($U_{model}$, $p = 0.004$), with process-mining analyses revealing that high-understanding users tend to follow a global-to-specific questioning sequence. The study contributes a replicable methodology for evaluating conversational XAI, identifies question patterns that correlate with gains, and offers design guidance for interactive explanations that enhance users’ co-constructed understanding of AI decisions. These findings have practical implications for building more effective dialogue-based explanations in real-world AI systems, particularly for lay users.

Abstract

The field of eXplainable Artificial Intelligence (XAI) is increasingly recognizing the need to personalize and/or interactively adapt the explanation to better reflect users' explanation needs. While dialogue-based approaches to XAI have been proposed recently, the state-of-the-art in XAI is still characterized by what we call one-shot, non-personalized and one-way explanations. In contrast, dialogue-based systems that can adapt explanations through interaction with a user promise to be superior to GUI-based or dashboard explanations as they offer a more intuitive way of requesting information. In general, while interactive XAI systems are often evaluated in terms of user satisfaction, there are limited studies that access user's objective model understanding. This is in particular the case for dialogue-based XAI approaches. In this paper, we close this gap by carrying out controlled experiments within a dialogue framework in which we measure understanding of users in three phases by asking them to simulate the predictions of the model they are learning about. By this, we can quantify the level of (improved) understanding w.r.t. how the model works, comparing the state prior, and after the interaction. We further analyze the data to reveal patterns of how the interaction between groups with high vs. low understanding gain differ. Overall, our work thus contributes to our understanding about the effectiveness of XAI approaches.

Measuring User Understanding in Dialogue-based XAI Systems

TL;DR

This work addresses the challenge of objectively measuring user understanding in dialogue-based XAI by introducing a controlled three-phase experimental framework that pits an interactive dialogue interface against a static explanation baseline. Participants interact with a local-explanation–driven system on an income-prediction task, and understanding is quantified via simulation tasks using scores , , and , analyzed with one-parameter IRT and nonparametric tests. Results show a significant improvement in objective model understanding for the interactive condition (, ), with process-mining analyses revealing that high-understanding users tend to follow a global-to-specific questioning sequence. The study contributes a replicable methodology for evaluating conversational XAI, identifies question patterns that correlate with gains, and offers design guidance for interactive explanations that enhance users’ co-constructed understanding of AI decisions. These findings have practical implications for building more effective dialogue-based explanations in real-world AI systems, particularly for lay users.

Abstract

The field of eXplainable Artificial Intelligence (XAI) is increasingly recognizing the need to personalize and/or interactively adapt the explanation to better reflect users' explanation needs. While dialogue-based approaches to XAI have been proposed recently, the state-of-the-art in XAI is still characterized by what we call one-shot, non-personalized and one-way explanations. In contrast, dialogue-based systems that can adapt explanations through interaction with a user promise to be superior to GUI-based or dashboard explanations as they offer a more intuitive way of requesting information. In general, while interactive XAI systems are often evaluated in terms of user satisfaction, there are limited studies that access user's objective model understanding. This is in particular the case for dialogue-based XAI approaches. In this paper, we close this gap by carrying out controlled experiments within a dialogue framework in which we measure understanding of users in three phases by asking them to simulate the predictions of the model they are learning about. By this, we can quantify the level of (improved) understanding w.r.t. how the model works, comparing the state prior, and after the interaction. We further analyze the data to reveal patterns of how the interaction between groups with high vs. low understanding gain differ. Overall, our work thus contributes to our understanding about the effectiveness of XAI approaches.
Paper Structure (25 sections, 5 figures, 1 table)

This paper contains 25 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Experiment UI. a) shows a teaching step in the learning phase of the interactive condition. After a prediction is selected on the right, the Chatbot and Question panel pop up to see the true model prediction and engage in explanations. b) shows a testing instance in the learning step, where the modifications are indicated.
  • Figure 2: Flow diagram of the study steps: The blue phases require participants to make predictions for displayed instances. The yellow fields indicate assessments of understanding and confidence. While the initial test and final test phases involve the same instances, the instances for the learning phase are modifications of those.
  • Figure 3: Statistical analysis of difference in intuition and model understanding across conditions. No significant difference in intuition and highly significant difference in model understanding.
  • Figure 4: Questions selected for best vs worst understanding improvement users.
  • Figure 5: Question Sequences as process mining graphs for participants with lowest and highest final understanding.