Table of Contents
Fetching ...

"A Good Bot Always Knows Its Limitations": Assessing Autonomous System Decision-making Competencies through Factorized Machine Self-confidence

Brett W. Israelsen, Nisar R. Ahmed, Matthew Aitken, Eric W. Frew, Dale A. Lawrence, Brian M. Argrow

TL;DR

This paper tackles how autonomous agents can quantify and communicate their own decision-making competencies under uncertainty. It introduces FaMSeC, a five-factor metacognitive framework (outcome assessment, solver quality, model/data validity, alignment with user intent, and historical experience) that derives interpretable indicators from problem-solving statistics within MDP-based decision processes. The authors develop detailed methods for two factors—outcome assessment via meta-utilities and solver quality via distributions of policy outcomes and surrogate modeling—grounded in examples from an autonomous delivery task. They discuss broader practical considerations, validation challenges, and connections to related work in XAI, metacognition, and formal methods, aiming to provide a structured approach to competency reporting for human users and system designers. The work demonstrates both the feasibility of automatic competency self-assessment and the need for careful standardization, validation, and extension to broader decision-making paradigms beyond MDPs.

Abstract

How can intelligent machines assess their competency to complete a task? This question has come into focus for autonomous systems that algorithmically make decisions under uncertainty. We argue that machine self-confidence -- a form of meta-reasoning based on self-assessments of system knowledge about the state of the world, itself, and ability to reason about and execute tasks -- leads to many computable and useful competency indicators for such agents. This paper presents our body of work, so far, on this concept in the form of the Factorized Machine Self-confidence (FaMSeC) framework, which holistically considers several major factors driving competency in algorithmic decision-making: outcome assessment, solver quality, model quality, alignment quality, and past experience. In FaMSeC, self-confidence indicators are derived via 'problem-solving statistics' embedded in Markov decision process solvers and related approaches. These statistics come from evaluating probabilistic exceedance margins in relation to certain outcomes and associated competency standards specified by an evaluator. Once designed, and evaluated, the statistics can be easily incorporated into autonomous agents and serve as indicators of competency. We include detailed descriptions and examples for Markov decision process agents, and show how outcome assessment and solver quality factors can be found for a range of tasking contexts through novel use of meta-utility functions, behavior simulations, and surrogate prediction models. Numerical evaluations are performed to demonstrate that FaMSeC indicators perform as desired (references to human subject studies beyond the scope of this paper are provided).

"A Good Bot Always Knows Its Limitations": Assessing Autonomous System Decision-making Competencies through Factorized Machine Self-confidence

TL;DR

This paper tackles how autonomous agents can quantify and communicate their own decision-making competencies under uncertainty. It introduces FaMSeC, a five-factor metacognitive framework (outcome assessment, solver quality, model/data validity, alignment with user intent, and historical experience) that derives interpretable indicators from problem-solving statistics within MDP-based decision processes. The authors develop detailed methods for two factors—outcome assessment via meta-utilities and solver quality via distributions of policy outcomes and surrogate modeling—grounded in examples from an autonomous delivery task. They discuss broader practical considerations, validation challenges, and connections to related work in XAI, metacognition, and formal methods, aiming to provide a structured approach to competency reporting for human users and system designers. The work demonstrates both the feasibility of automatic competency self-assessment and the need for careful standardization, validation, and extension to broader decision-making paradigms beyond MDPs.

Abstract

How can intelligent machines assess their competency to complete a task? This question has come into focus for autonomous systems that algorithmically make decisions under uncertainty. We argue that machine self-confidence -- a form of meta-reasoning based on self-assessments of system knowledge about the state of the world, itself, and ability to reason about and execute tasks -- leads to many computable and useful competency indicators for such agents. This paper presents our body of work, so far, on this concept in the form of the Factorized Machine Self-confidence (FaMSeC) framework, which holistically considers several major factors driving competency in algorithmic decision-making: outcome assessment, solver quality, model quality, alignment quality, and past experience. In FaMSeC, self-confidence indicators are derived via 'problem-solving statistics' embedded in Markov decision process solvers and related approaches. These statistics come from evaluating probabilistic exceedance margins in relation to certain outcomes and associated competency standards specified by an evaluator. Once designed, and evaluated, the statistics can be easily incorporated into autonomous agents and serve as indicators of competency. We include detailed descriptions and examples for Markov decision process agents, and show how outcome assessment and solver quality factors can be found for a range of tasking contexts through novel use of meta-utility functions, behavior simulations, and surrogate prediction models. Numerical evaluations are performed to demonstrate that FaMSeC indicators perform as desired (references to human subject studies beyond the scope of this paper are provided).
Paper Structure (53 sections, 14 equations, 28 figures, 4 tables, 2 algorithms)

This paper contains 53 sections, 14 equations, 28 figures, 4 tables, 2 algorithms.

Figures (28)

  • Figure 1: Road network for Autonomous Delivery Truck (ADT) problem.
  • Figure 2: Representation of key high-level factors in agent task completion and competency evaluation in terms of an agent $\mathcal{A}$ operating in context $\mathcal{C}$ being evaluated on outcomes $\mathcal{O}$ with respect to some standard of competency $\Sigma$.Variables are defined in significantly more detail in the text.
  • Figure 3: Block diagram showing how FaMSeC indicator functions (red) relate to components (white boxes) of a typical rational decision-making algorithmic agent $\mathcal{A}{}$ (dark grey box) interacting with user (light grey box).
  • Figure 4: Typical Value Function
  • Figure 5: A Typical Weighting Function
  • ...and 23 more figures