"A Good Bot Always Knows Its Limitations": Assessing Autonomous System Decision-making Competencies through Factorized Machine Self-confidence

Brett W. Israelsen; Nisar R. Ahmed; Matthew Aitken; Eric W. Frew; Dale A. Lawrence; Brian M. Argrow

"A Good Bot Always Knows Its Limitations": Assessing Autonomous System Decision-making Competencies through Factorized Machine Self-confidence

Brett W. Israelsen, Nisar R. Ahmed, Matthew Aitken, Eric W. Frew, Dale A. Lawrence, Brian M. Argrow

TL;DR

This paper tackles how autonomous agents can quantify and communicate their own decision-making competencies under uncertainty. It introduces FaMSeC, a five-factor metacognitive framework (outcome assessment, solver quality, model/data validity, alignment with user intent, and historical experience) that derives interpretable indicators from problem-solving statistics within MDP-based decision processes. The authors develop detailed methods for two factors—outcome assessment via meta-utilities and solver quality via distributions of policy outcomes and surrogate modeling—grounded in examples from an autonomous delivery task. They discuss broader practical considerations, validation challenges, and connections to related work in XAI, metacognition, and formal methods, aiming to provide a structured approach to competency reporting for human users and system designers. The work demonstrates both the feasibility of automatic competency self-assessment and the need for careful standardization, validation, and extension to broader decision-making paradigms beyond MDPs.

Abstract

How can intelligent machines assess their competency to complete a task? This question has come into focus for autonomous systems that algorithmically make decisions under uncertainty. We argue that machine self-confidence -- a form of meta-reasoning based on self-assessments of system knowledge about the state of the world, itself, and ability to reason about and execute tasks -- leads to many computable and useful competency indicators for such agents. This paper presents our body of work, so far, on this concept in the form of the Factorized Machine Self-confidence (FaMSeC) framework, which holistically considers several major factors driving competency in algorithmic decision-making: outcome assessment, solver quality, model quality, alignment quality, and past experience. In FaMSeC, self-confidence indicators are derived via 'problem-solving statistics' embedded in Markov decision process solvers and related approaches. These statistics come from evaluating probabilistic exceedance margins in relation to certain outcomes and associated competency standards specified by an evaluator. Once designed, and evaluated, the statistics can be easily incorporated into autonomous agents and serve as indicators of competency. We include detailed descriptions and examples for Markov decision process agents, and show how outcome assessment and solver quality factors can be found for a range of tasking contexts through novel use of meta-utility functions, behavior simulations, and surrogate prediction models. Numerical evaluations are performed to demonstrate that FaMSeC indicators perform as desired (references to human subject studies beyond the scope of this paper are provided).

"A Good Bot Always Knows Its Limitations": Assessing Autonomous System Decision-making Competencies through Factorized Machine Self-confidence

TL;DR

Abstract

Paper Structure (53 sections, 14 equations, 28 figures, 4 tables, 2 algorithms)

This paper contains 53 sections, 14 equations, 28 figures, 4 tables, 2 algorithms.

Introduction
Background and Definitions
Rational Agents and Competency Evaluation
Markov Decision Processes
Bounded Rationality: Ideal vs. Realistic Agents
Illustrative Example: Autonomous Delivery in an Adversarial Setting
Competency Assessment for MDP-based agent
The Autonomous Tasking and Competency Evaluation Process
Strategies for Implementing Algorithmic Competency Self-Assessments
Factorized Machine Self-Confidence
Expected Properties of FaMSeC Indicators
Outcome Assessment
Going Meta: from Utilities to Meta-Utilities
Limitations of Analyzing Built-in Utilities
Meta-utilities
...and 38 more sections

Figures (28)

Figure 1: Road network for Autonomous Delivery Truck (ADT) problem.
Figure 2: Representation of key high-level factors in agent task completion and competency evaluation in terms of an agent $\mathcal{A}$ operating in context $\mathcal{C}$ being evaluated on outcomes $\mathcal{O}$ with respect to some standard of competency $\Sigma$.Variables are defined in significantly more detail in the text.
Figure 3: Block diagram showing how FaMSeC indicator functions (red) relate to components (white boxes) of a typical rational decision-making algorithmic agent $\mathcal{A}{}$ (dark grey box) interacting with user (light grey box).
Figure 4: Typical Value Function
Figure 5: A Typical Weighting Function
...and 23 more figures

"A Good Bot Always Knows Its Limitations": Assessing Autonomous System Decision-making Competencies through Factorized Machine Self-confidence

TL;DR

Abstract

"A Good Bot Always Knows Its Limitations": Assessing Autonomous System Decision-making Competencies through Factorized Machine Self-confidence

Authors

TL;DR

Abstract

Table of Contents

Figures (28)