Table of Contents
Fetching ...

Competence-Aware AI Agents with Metacognition for Unknown Situations and Environments (MUSE)

Rodolfo Valiente, Praveen K. Pilly

TL;DR

The paper tackles the vulnerability of autonomous agents to unknown environments by introducing MUSE, a competence-aware metacognitive framework that integrates self-assessment and self-regulation. It provides two concrete implementations: a decoder-based world-model agent built on Dreamer-v3 and an LLM-based agent that extends ReAct and Reflexion with competence grounding. Across Meta-World and ALFWorld, MUSE demonstrates strong competence awareness (AUROC$_2$ up to 0.95 and metacognitive accuracy around 92%) and superior self-regulation, solving significantly more novel tasks and faster than non-metacognitive baselines. The work suggests that metacognition can enable safer online adaptation and better generalization to unseen tasks, even with smaller LLMs and less reliance on extensive pre-training data.

Abstract

Metacognition, defined as the awareness and regulation of one's cognitive processes, is central to human adaptability in unknown situations. In contrast, current autonomous agents often struggle in novel environments due to their limited capacity for adaptation. We hypothesize that metacognition is a critical missing ingredient in autonomous agents for the cognitive flexibility needed to tackle unfamiliar challenges. Given the broad scope of metacognitive abilities, we focus on competence awareness and strategy selection. To this end, we propose the Metacognition for Unknown Situations and Environments (MUSE) framework to integrate metacognitive processes of self-assessment and self-regulation into autonomous agents. We present two implementations of MUSE: one based on world modeling and another leveraging large language models (LLMs). Our system continually learns to assess its competence on a given task and uses this self-assessment to guide iterative cycles of strategy selection. MUSE agents demonstrate high competence awareness and significant improvements in self-regulation for solving novel, out-of-distribution tasks more effectively compared to model-based reinforcement learning and purely prompt-based LLM agent approaches. This work highlights the promise of approaches inspired by cognitive and neural systems in enabling autonomous agents to adapt to new environments while mitigating the heavy reliance on extensive training data and large models for the current models.

Competence-Aware AI Agents with Metacognition for Unknown Situations and Environments (MUSE)

TL;DR

The paper tackles the vulnerability of autonomous agents to unknown environments by introducing MUSE, a competence-aware metacognitive framework that integrates self-assessment and self-regulation. It provides two concrete implementations: a decoder-based world-model agent built on Dreamer-v3 and an LLM-based agent that extends ReAct and Reflexion with competence grounding. Across Meta-World and ALFWorld, MUSE demonstrates strong competence awareness (AUROC up to 0.95 and metacognitive accuracy around 92%) and superior self-regulation, solving significantly more novel tasks and faster than non-metacognitive baselines. The work suggests that metacognition can enable safer online adaptation and better generalization to unseen tasks, even with smaller LLMs and less reliance on extensive pre-training data.

Abstract

Metacognition, defined as the awareness and regulation of one's cognitive processes, is central to human adaptability in unknown situations. In contrast, current autonomous agents often struggle in novel environments due to their limited capacity for adaptation. We hypothesize that metacognition is a critical missing ingredient in autonomous agents for the cognitive flexibility needed to tackle unfamiliar challenges. Given the broad scope of metacognitive abilities, we focus on competence awareness and strategy selection. To this end, we propose the Metacognition for Unknown Situations and Environments (MUSE) framework to integrate metacognitive processes of self-assessment and self-regulation into autonomous agents. We present two implementations of MUSE: one based on world modeling and another leveraging large language models (LLMs). Our system continually learns to assess its competence on a given task and uses this self-assessment to guide iterative cycles of strategy selection. MUSE agents demonstrate high competence awareness and significant improvements in self-regulation for solving novel, out-of-distribution tasks more effectively compared to model-based reinforcement learning and purely prompt-based LLM agent approaches. This work highlights the promise of approaches inspired by cognitive and neural systems in enabling autonomous agents to adapt to new environments while mitigating the heavy reliance on extensive training data and large models for the current models.

Paper Structure

This paper contains 30 sections, 7 equations, 9 figures, 4 tables, 4 algorithms.

Figures (9)

  • Figure 1: The metacognitive cycle of self-assessment and self-regulation operates on the traditional perception-action loop of existing AI agents to boost their ability for iterative problem-solving in unknown situations and environments.
  • Figure 2: Schematic of the implementation of self-assessment in the context of the Dreamer-v3 World Model (adapted from hafner2023mastering). The input state $x$ to the RSSM is encoded into latent embedding $z$. The model recurrently predicts self-assessment $\hat{c}$, reward $\hat{r}$, and terminal signal $\hat{d}$, while also decoding the input state $\hat{x}$.
  • Figure 3: Meta-World pre-deployment training set [button-press, door-open, drawer-close, drawer-open, peg-insert-side, pick-place, push, reach, window-close, window-open], which comprises the 10 tasks from the MT10 suite.
  • Figure 4: Meta-World evaluation set [button-press-topdown-wall, soccer, push-wall, push-block, coffee-button, plate-slide, peg-unplug-side, lever-pull, handle-press, door-unlock], which comprises 10 novel tasks from the MT50 suite with distinct reward functions, which differ semantically from those in the pre-deployment training set.
  • Figure 5: Average number of time steps per episode required to solve each of the 10 novel tasks in the Meta-World environment. MUSE (a) successfully solved 7 out of 10 tasks, whereas Dreamer-v3 (b) failed to solve any of them. Note that to facilitate illustration, unsolved tasks are assigned a nominal value of -25 and depicted by a red bar.
  • ...and 4 more figures