Counterfactual Explanations as Plans

Vaishak Belle

Counterfactual Explanations as Plans

Vaishak Belle

TL;DR

The paper develops a formal account of counterfactual explanations as plans within a dynamic, epistemic planning setting, using the modal fragment of the situation calculus denoted ES. It shows how counterfactual explanations naturally connect to model reconciliation, including scenarios with partial truths, weakened truths, and false beliefs, by reasoning about what is true versus what is known. A central contribution is the definition of objective counterfactual plans and reconciliation-based explanations, supported by distance and diversity criteria, and extended to nested knowledge and sensing actions. The approach provides a principled framework for generating actionable, user-relevant explanations in dynamic domains, with connections to discrepancy-based planning, diagnosis, and declarative reasoning approaches. This work lays groundwork for integrating costs, optimality, and diversity into explainable planning, enabling more effective human-agent collaboration in complex, sequential tasks.

Abstract

There has been considerable recent interest in explainability in AI, especially with black-box machine learning models. As correctly observed by the planning community, when the application at hand is not a single-shot decision or prediction, but a sequence of actions that depend on observations, a richer notion of explanations are desirable. In this paper, we look to provide a formal account of ``counterfactual explanations," based in terms of action sequences. We then show that this naturally leads to an account of model reconciliation, which might take the form of the user correcting the agent's model, or suggesting actions to the agent's plan. For this, we will need to articulate what is true versus what is known, and we appeal to a modal fragment of the situation calculus to formalise these intuitions. We consider various settings: the agent knowing partial truths, weakened truths and having false beliefs, and show that our definitions easily generalize to these different settings.

Counterfactual Explanations as Plans

TL;DR

Abstract

Counterfactual Explanations as Plans

TL;DR

Abstract

Paper Structure

Table of Contents

Theorems & Definitions (23)