Table of Contents
Fetching ...

PABU: Progress-Aware Belief Update for Efficient LLM Agents

Haitao Jiang, Lin Ge, Hengrui Cai, Rui Song

TL;DR

Progress-Aware Belief Update (PABU), a belief-state framework that compactly represents an agent's state by explicitly modeling task progress and selectively retaining past actions and observations, is proposed.

Abstract

Large Language Model (LLM) agents commonly condition actions on full action-observation histories, which introduce task-irrelevant information that easily leads to redundant actions and higher inference cost. We propose Progress-Aware Belief Update (PABU), a belief-state framework that compactly represents an agent's state by explicitly modeling task progress and selectively retaining past actions and observations. At each step, the agent predicts its relative progress since the previous round and decides whether the newly encountered interaction should be stored, conditioning future decisions only on the retained subset. Across eight environments in the AgentGym benchmark, and using identical training trajectories, PABU achieves an 81.0% task completion rate, outperforming previous State of the art (SoTA) models with full-history belief by 23.9%. Additionally, PABU's progress-oriented action selection improves efficiency, reducing the average number of interaction steps to 9.5, corresponding to a 26.9% reduction. Ablation studies show that both explicit progress prediction and selective retention are necessary for robust belief learning and performance gains.

PABU: Progress-Aware Belief Update for Efficient LLM Agents

TL;DR

Progress-Aware Belief Update (PABU), a belief-state framework that compactly represents an agent's state by explicitly modeling task progress and selectively retaining past actions and observations, is proposed.

Abstract

Large Language Model (LLM) agents commonly condition actions on full action-observation histories, which introduce task-irrelevant information that easily leads to redundant actions and higher inference cost. We propose Progress-Aware Belief Update (PABU), a belief-state framework that compactly represents an agent's state by explicitly modeling task progress and selectively retaining past actions and observations. At each step, the agent predicts its relative progress since the previous round and decides whether the newly encountered interaction should be stored, conditioning future decisions only on the retained subset. Across eight environments in the AgentGym benchmark, and using identical training trajectories, PABU achieves an 81.0% task completion rate, outperforming previous State of the art (SoTA) models with full-history belief by 23.9%. Additionally, PABU's progress-oriented action selection improves efficiency, reducing the average number of interaction steps to 9.5, corresponding to a 26.9% reduction. Ablation studies show that both explicit progress prediction and selective retention are necessary for robust belief learning and performance gains.
Paper Structure (46 sections, 11 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 46 sections, 11 equations, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: Motivation for Progress-aware Belief Update. In a task such as heating and drinking milk, conventional methods rely on full interaction histories (left), which are noisy, redundant, and expensive to process. Our method (right) performs progress-aware belief updates, preserving only essential information and yielding a compact context that supports efficient and reliable learning.
  • Figure 2: POMDP Formulation with Belief State Estimation. The agent has no direct access to the latent environment states $S_\cdot$ or their dynamics. Instead, it maintains a belief state $B_\cdot$ to estimate the current status from partial observations $O_\cdot$ and past actions $A_\cdot$, then uses this belief to guide action selection.
  • Figure 3: Belief State Design. Progress-only belief states are under-specified when identical progress stages require different actions across environments, while augments progress with selective retention of task-relevant actions and observations.
  • Figure 4: Inference pipeline, where the policy generates the retention decision for the previous observation, estimates the current progress, and predicts the next action sequentially.
  • Figure 5: Performance comparison across different backbone models. Smaller models can learn embodied environments effectively given sufficient training samples, while larger models generally perform better in environments that require deeper language understanding.
  • ...and 3 more figures