CARL: Critical Action Focused Reinforcement Learning for Multi-Step Agent

Leyang Shen; Yang Zhang; Chun Kai Ling; Xiaoyan Zhao; Tat-Seng Chua

CARL: Critical Action Focused Reinforcement Learning for Multi-Step Agent

Leyang Shen, Yang Zhang, Chun Kai Ling, Xiaoyan Zhao, Tat-Seng Chua

TL;DR

The paper tackles inefficiencies in multi-step, knowledge-intensive tasks by challenging the assumption that all actions equally contribute to outcomes. It introduces CARL, a critical-action-focused reinforcement learning framework that identifies high-criticality actions via state entropy and concentrates rollout, action-level rewards, and updates on those actions using an entropy-guided progressive rollout with selective updates. Empirical results across reasoning and non-reasoning models show that CARL improves performance while substantially reducing training and inference costs compared to group-level policy optimization. The approach preserves exploration through higher policy entropy and demonstrates strong efficiency gains on knowledge-intensive QA benchmarks, suggesting practical value for large, multi-turn agents in real-world settings.

Abstract

Agents capable of accomplishing complex tasks through multiple interactions with the environment have emerged as a popular research direction. However, in such multi-step settings, the conventional group-level policy optimization algorithm becomes suboptimal because of its underlying assumption that each action holds equal contribution, which deviates significantly from reality. Our analysis reveals that only a small fraction of actions are critical in determining the final outcome. Building on this insight, we propose CARL, a critical-action-focused reinforcement learning algorithm tailored for multi-step agents. CARL achieves focused training through providing action-level optimization signals for high-criticality actions while excluding low-criticality actions from model update. Extensive experiments demonstrate that CARL achieves both stronger performance and higher efficiency during training and inference across diverse evaluation settings.

CARL: Critical Action Focused Reinforcement Learning for Multi-Step Agent

TL;DR

Abstract

CARL: Critical Action Focused Reinforcement Learning for Multi-Step Agent

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)