Intention-aware policy graphs: answering what, how, and why in opaque agents

Victor Gimenez-Abalos; Sergio Alvarez-Napagao; Adrian Tormos; Ulises Cortés; Javier Vázquez-Salceda

Intention-aware policy graphs: answering what, how, and why in opaque agents

Victor Gimenez-Abalos, Sergio Alvarez-Napagao, Adrian Tormos, Ulises Cortés, Javier Vázquez-Salceda

TL;DR

A Probabilistic Graphical Model is proposed along with a pipeline for designing such model -- by which the behaviour of an agent can be deliberated about -- and for computing a robust numerical value for the intentions the agent has at any moment.

Abstract

Agents are a special kind of AI-based software in that they interact in complex environments and have increased potential for emergent behaviour. Explaining such emergent behaviour is key to deploying trustworthy AI, but the increasing complexity and opaque nature of many agent implementations makes this hard. In this work, we propose a Probabilistic Graphical Model along with a pipeline for designing such model -- by which the behaviour of an agent can be deliberated about -- and for computing a robust numerical value for the intentions the agent has at any moment. We contribute measurements that evaluate the interpretability and reliability of explanations provided, and enables explainability questions such as `what do you want to do now?' (e.g. deliver soup) `how do you plan to do it?' (e.g. returning a plan that considers its skills and the world), and `why would you take this action at this state?' (e.g. explaining how that furthers or hinders its own goals). This model can be constructed by taking partial observations of the agent's actions and world states, and we provide an iterative workflow for increasing the proposed measurements through better design and/or pointing out irrational agent behaviour.

Intention-aware policy graphs: answering what, how, and why in opaque agents

TL;DR

Abstract

Paper Structure (24 sections, 6 equations, 8 figures, 5 tables, 4 algorithms)

This paper contains 24 sections, 6 equations, 8 figures, 5 tables, 4 algorithms.

Introduction
Background
Agent Explainability
Policy graphs
Intentionality
Use case
Methodology
Policy Graph construction and design heuristics
Explainability based on desires and intentions
Desires
Intentions
Explanation-extraction and answerable queries
Metrics
Static Metrics
Intention Metrics
...and 9 more sections

Figures (8)

Figure 1: Proposed workflow for extracting explainability. First, (partial) observations of the agent interacting in the environment are taken. The explainee then proposes a (several) discretiser(s) to describe the states, following the heuristics in Section \ref{['sec:design_heur']}, that is written in a code they can understand, and that allows them to check a set of hypothesised desires of the agent as described in Section \ref{['sec:desiresintentions']}. Then, the resulting can be evaluated with the metrics proposed in Section \ref{['sec:static_metrics']}, allowing the user to gauge the complexity of the representation and a first estimand of the interpretability and reliability of the model, and can loop back to check a different representation if the equilibrium is not acceptable. Finally, the explainee introduces hypothesised desires into the , from which they can obtain metrics that validate these hypotheses and give direct estimands of reliability and interpretability, as described in Section \ref{['sec:intention_metrics']}. If the explanations are insufficient, the user can filter the regions without apparent intention to hypothesise new desires, as described in Section \ref{['sec:revision']}. If the frequency of intentions is too low, the representation may be too complex and can be redesigned. If the results are acceptable, the resulting can be used for new downstream tasks, such as QA explainability as described in Section \ref{['sec:explanation_algorithms']}.
Figure 2: Overcooked visualisation of the analysed layouts, from left to right Simple, Random 1, Random 3, Unident_s, and Random 0
Figure 3: Desire metrics for two types of agents (Human-Collaborating Agent and PPO Agent 1) in two environments (Simple and Unident_s) environments and the same discretiser (1), all described in Section \ref{['sec:experiments']}. The desire probability (blue) is very low for all cases. Higher values of desire probability are indicative of higher performance, subjected to the desire being actually fulfilled (orange). PPO Agent 1 Unident_s never fulfills the service desire, but is quite frequently fulfilling the rest. Note how Human-Collaborating Agent is never in a state in which it can fulfill any hypothesised desire in Unident_s, meaning its behaviour is unexplainable.
Figure 4: The semaphor environment and the proposed discretisers. Colours have been placed to distinguish between different state-action values of the agent’s policy when needed. The environment does not reward to go up when the red light is on, but rather to go up when the green light is on instead. This reward is more effectively represented in the smart discretiser than in the lousy discretiser, as the latter grants a probability to go up in the state with a green light that is lower than 100%.
Figure 5: Intention metrics for Layout Simple for each of the 4 agents (in order, PPO Agent 1, PPO Agent 2, Human-Collaborating Agent, Human Agent, and Random Agent) using discretiser 1. Collaboration and specialisation can be seen (of each pair, one agent specialises in serving and another in cooking). Both PPO Agent 1 and Human-Collaborating Agent agents specialise on delivering soup, and conversely PPO Agent 2 and Human Agent work on cooking. With a $0.5$ commitment threshold, expected intention fulfillment is very high for all cases, but overall agent interpretability is low ($15\%$ of the time) for agents specialising in delivering soup (as they spend most of the time apparently idle). Random Agent shows apparently high reliability in fulfilling intentions: this corresponds to states in which executing random actions eventually results in fulfilling a desire. These states happen with a probability $<0.1\%$.
...and 3 more figures

Intention-aware policy graphs: answering what, how, and why in opaque agents

TL;DR

Abstract

Intention-aware policy graphs: answering what, how, and why in opaque agents

Authors

TL;DR

Abstract

Table of Contents

Figures (8)