PABU: Progress-Aware Belief Update for Efficient LLM Agents

Haitao Jiang; Lin Ge; Hengrui Cai; Rui Song

PABU: Progress-Aware Belief Update for Efficient LLM Agents

Haitao Jiang, Lin Ge, Hengrui Cai, Rui Song

TL;DR

Progress-Aware Belief Update (PABU), a belief-state framework that compactly represents an agent's state by explicitly modeling task progress and selectively retaining past actions and observations, is proposed.

Abstract

Large Language Model (LLM) agents commonly condition actions on full action-observation histories, which introduce task-irrelevant information that easily leads to redundant actions and higher inference cost. We propose Progress-Aware Belief Update (PABU), a belief-state framework that compactly represents an agent's state by explicitly modeling task progress and selectively retaining past actions and observations. At each step, the agent predicts its relative progress since the previous round and decides whether the newly encountered interaction should be stored, conditioning future decisions only on the retained subset. Across eight environments in the AgentGym benchmark, and using identical training trajectories, PABU achieves an 81.0% task completion rate, outperforming previous State of the art (SoTA) models with full-history belief by 23.9%. Additionally, PABU's progress-oriented action selection improves efficiency, reducing the average number of interaction steps to 9.5, corresponding to a 26.9% reduction. Ablation studies show that both explicit progress prediction and selective retention are necessary for robust belief learning and performance gains.

PABU: Progress-Aware Belief Update for Efficient LLM Agents

TL;DR

Abstract

Paper Structure (46 sections, 11 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 46 sections, 11 equations, 8 figures, 5 tables, 1 algorithm.

Introduction
Problem Formulation
Environment and Agentic Tasks
Belief State Abstraction
Optimization Objective
Progress-Aware Belief Update
Progress as the Backbone of Belief State
Selective Retention Mechanism
Training and Inference Pipelines
Trajectory Augmentation.
Progress-Aware Learning Objective.
Inference Pipeline.
Numerical Study
Baselines and Metrics
Main Results
...and 31 more sections

Figures (8)

Figure 1: Motivation for Progress-aware Belief Update. In a task such as heating and drinking milk, conventional methods rely on full interaction histories (left), which are noisy, redundant, and expensive to process. Our method (right) performs progress-aware belief updates, preserving only essential information and yielding a compact context that supports efficient and reliable learning.
Figure 2: POMDP Formulation with Belief State Estimation. The agent has no direct access to the latent environment states $S_\cdot$ or their dynamics. Instead, it maintains a belief state $B_\cdot$ to estimate the current status from partial observations $O_\cdot$ and past actions $A_\cdot$, then uses this belief to guide action selection.
Figure 3: Belief State Design. Progress-only belief states are under-specified when identical progress stages require different actions across environments, while augments progress with selective retention of task-relevant actions and observations.
Figure 4: Inference pipeline, where the policy generates the retention decision for the previous observation, estimates the current progress, and predicts the next action sequentially.
Figure 5: Performance comparison across different backbone models. Smaller models can learn embodied environments effectively given sufficient training samples, while larger models generally perform better in environments that require deeper language understanding.
...and 3 more figures

PABU: Progress-Aware Belief Update for Efficient LLM Agents

TL;DR

Abstract

PABU: Progress-Aware Belief Update for Efficient LLM Agents

Authors

TL;DR

Abstract

Table of Contents

Figures (8)