Table of Contents
Fetching ...

Objective Metrics for Human-Subjects Evaluation in Explainable Reinforcement Learning

Balint Gyevnar, Mark Towers

TL;DR

This work tackles the lack of objective, human-centric evaluation in explainable reinforcement learning by proposing a curated set of actionable metrics for debugging and human-agent teaming. It details a grid-world mini-environment to illustrate how Next Action, Goal, Sub-Goal, Counterfactual, and Time Taken metrics can be collected and interpreted, and extends the framework to teaming scenarios with Task Completion, Inter-Agent Conflict, and Time Taken measures. The authors argue for integrating objective metrics with subjective feedback and cognitive-science insights, advocate baseline No-Explanation conditions, and call for standardized benchmarks to improve reproducibility and comparability across XRL studies. Collectively, the paper aims to make explainability research more practically useful, scalable, and epistemically grounded by focusing on observable human behavior and measurable outcomes.

Abstract

Explanation is a fundamentally human process. Understanding the goal and audience of the explanation is vital, yet existing work on explainable reinforcement learning (XRL) routinely does not consult humans in their evaluations. Even when they do, they routinely resort to subjective metrics, such as confidence or understanding, that can only inform researchers of users' opinions, not their practical effectiveness for a given problem. This paper calls on researchers to use objective human metrics for explanation evaluations based on observable and actionable behaviour to build more reproducible, comparable, and epistemically grounded research. To this end, we curate, describe, and compare several objective evaluation methodologies for applying explanations to debugging agent behaviour and supporting human-agent teaming, illustrating our proposed methods using a novel grid-based environment. We discuss how subjective and objective metrics complement each other to provide holistic validation and how future work needs to utilise standardised benchmarks for testing to enable greater comparisons between research.

Objective Metrics for Human-Subjects Evaluation in Explainable Reinforcement Learning

TL;DR

This work tackles the lack of objective, human-centric evaluation in explainable reinforcement learning by proposing a curated set of actionable metrics for debugging and human-agent teaming. It details a grid-world mini-environment to illustrate how Next Action, Goal, Sub-Goal, Counterfactual, and Time Taken metrics can be collected and interpreted, and extends the framework to teaming scenarios with Task Completion, Inter-Agent Conflict, and Time Taken measures. The authors argue for integrating objective metrics with subjective feedback and cognitive-science insights, advocate baseline No-Explanation conditions, and call for standardized benchmarks to improve reproducibility and comparability across XRL studies. Collectively, the paper aims to make explainability research more practically useful, scalable, and epistemically grounded by focusing on observable human behavior and measurable outcomes.

Abstract

Explanation is a fundamentally human process. Understanding the goal and audience of the explanation is vital, yet existing work on explainable reinforcement learning (XRL) routinely does not consult humans in their evaluations. Even when they do, they routinely resort to subjective metrics, such as confidence or understanding, that can only inform researchers of users' opinions, not their practical effectiveness for a given problem. This paper calls on researchers to use objective human metrics for explanation evaluations based on observable and actionable behaviour to build more reproducible, comparable, and epistemically grounded research. To this end, we curate, describe, and compare several objective evaluation methodologies for applying explanations to debugging agent behaviour and supporting human-agent teaming, illustrating our proposed methods using a novel grid-based environment. We discuss how subjective and objective metrics complement each other to provide holistic validation and how future work needs to utilise standardised benchmarks for testing to enable greater comparisons between research.

Paper Structure

This paper contains 14 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: The mini-world environment and its semantic map. The environment has passable tiles and impassable obstacles such as trees, rocks, and water. There are three types of agents: farmers, soldiers, and skeletons. Agents spawn at differing but fixed rates from their corresponding buildings and can move in the four cardinal directions. The task of a farmer is to till the soil, collect water from the well, water the soil, plant seeds, and then harvest the crop. The task of the soldiers is primarily to protect the farmers and kill skeletons. The skeletons' goal is to kill everyone else. In our examples, we assume that an XRL agent was trained to control the soldier highlighted with a red circle.