Table of Contents
Fetching ...

Explainable Human-AI Interaction: A Planning Perspective

Sarath Sreedharan, Anagha Kulkarni, Subbarao Kambhampati

TL;DR

This work surveys a planning-centric framework for explainable human-AI interaction, emphasizing how AI agents can form and use mental models of humans to generate explicable, legible, and predictable behavior. It introduces model-reconciliation explanations, and model-space search techniques to produce minimal, contrastive explanations that align human and machine plans, while also exploring obfuscation and deception in adversarial settings. The text covers mathematical formalisms for COPP, explicable and legible planning, design-of-environments, and balanced planning that integrates communication costs with plan optimality. It culminates with broad applications in collaborative decision-making, human-robot teams, and actionable guidance for building human-centric AI, including trust and longitudinal interaction, learning human models, and safety considerations for advanced AI systems.

Abstract

From its inception, AI has had a rather ambivalent relationship with humans -- swinging between their augmentation and replacement. Now, as AI technologies enter our everyday lives at an ever increasing pace, there is a greater need for AI systems to work synergistically with humans. One critical requirement for such synergistic human-AI interaction is that the AI systems be explainable to the humans in the loop. To do this effectively, AI agents need to go beyond planning with their own models of the world, and take into account the mental model of the human in the loop. Drawing from several years of research in our lab, we will discuss how the AI agent can use these mental models to either conform to human expectations, or change those expectations through explanatory communication. While the main focus of the book is on cooperative scenarios, we will point out how the same mental models can be used for obfuscation and deception. Although the book is primarily driven by our own research in these areas, in every chapter, we will provide ample connections to relevant research from other groups.

Explainable Human-AI Interaction: A Planning Perspective

TL;DR

This work surveys a planning-centric framework for explainable human-AI interaction, emphasizing how AI agents can form and use mental models of humans to generate explicable, legible, and predictable behavior. It introduces model-reconciliation explanations, and model-space search techniques to produce minimal, contrastive explanations that align human and machine plans, while also exploring obfuscation and deception in adversarial settings. The text covers mathematical formalisms for COPP, explicable and legible planning, design-of-environments, and balanced planning that integrates communication costs with plan optimality. It culminates with broad applications in collaborative decision-making, human-robot teams, and actionable guidance for building human-centric AI, including trust and longitudinal interaction, learning human models, and safety considerations for advanced AI systems.

Abstract

From its inception, AI has had a rather ambivalent relationship with humans -- swinging between their augmentation and replacement. Now, as AI technologies enter our everyday lives at an ever increasing pace, there is a greater need for AI systems to work synergistically with humans. One critical requirement for such synergistic human-AI interaction is that the AI systems be explainable to the humans in the loop. To do this effectively, AI agents need to go beyond planning with their own models of the world, and take into account the mental model of the human in the loop. Drawing from several years of research in our lab, we will discuss how the AI agent can use these mental models to either conform to human expectations, or change those expectations through explanatory communication. While the main focus of the book is on cooperative scenarios, we will point out how the same mental models can be used for obfuscation and deception. Although the book is primarily driven by our own research in these areas, in every chapter, we will provide ample connections to relevant research from other groups.
Paper Structure (174 sections, 12 theorems, 54 equations, 32 figures, 7 algorithms)

This paper contains 174 sections, 12 theorems, 54 equations, 32 figures, 7 algorithms.

Key Result

Proposition 1

The algorithm necessarily terminates in finite number of $|\mathcal{S}|$ iterations, such that, the following conditions hold: (Completeness) The algorithm explores the complete solution space of $\mathcal{P_{CO}}$, that is, if there exists a $\pi_{\mathcal{P_{CO}}}$ that correctly solves $\mathcal{

Figures (32)

  • Figure 1: Architecture of an intelligent agent that takes human mental models into account. All portions in yellow are additions to the standard agent architecture, that are a result of the agent being human-aware. $\mathcal{M}^R_h$ is the mental model the human has of the AI agent's goals and capabilities and $\mathcal{M}^H_r$ is the (mental) model the AI agent has of the human's goal and capabilities (see the section on Mental Models in Human-Aware AI)
  • Figure 2: Test beds developed to study the dynamics of trust and teamwork between autonomous agents and their human teammates.
  • Figure 3: Use of different mental models in synthesizing explainable behavior. (Left) The AI system can use its estimation of human's mental model, $\mathcal{M}^H_r$, to take into account the goals and capabilities of the human thus providing appropriate help to them. (Right) The AI system can use its estimation of human's mental model of its capabilities $\mathcal{M}^R_h$ to exhibit explicable behavior and to provide explanations when needed.
  • Figure 4: A simple illustration of the differences between plan explicability, legibility and predictability. In this Gridworld, the robot can travel across cells, but cannot go backwards. Figure \ref{['fig:1']} illustrates a legible plan (green) in the presence of 3 possible goals of the robot, marked with ?s. The red plan is not legible since all three goals are likely in its initial stages. Figure \ref{['fig:2']} illustrates an explicable plan (green) which goes straight to the goal G as we would expect. The red plan may be more favorable to the robot due to its internal constraints (the arm sticking out might hit the wall), but is inexplicable (i.e. sub-optimal) in the observer's model. Finally, Figure \ref{['fig:3']} illustrates a predictable plan (green) since there is only one possible plan after it performs the first action. The red plans fail to disambiguate among two possible completions of the plan. Note that all the plans shown in Figure \ref{['fig:3']} are explicable (optimal in the observer's model) but only one of them is predictable -- i.e. explicable plans may not be predictable. Similarly, in Figure \ref{['fig:2']}, the red plan is predictable after the first action (even though not optimal, since there is only one likely completion) but not explicable -- i.e. predictable plans may not be explicable. Without a prefix in Figure \ref{['fig:2']}, the green plan is the only predictable plan.
  • Figure 5: Schematic diagram of the model based explicable planning.
  • ...and 27 more figures

Theorems & Definitions (59)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Definition 6
  • Definition 7
  • Definition 8
  • Definition 9
  • Definition 10
  • ...and 49 more