ICA: Information-Aware Credit Assignment for Visually Grounded Long-Horizon Information-Seeking Agents
Cong Pang, Xuyu Feng, Yujie Yi, Zixuan Chen, Jiawei Hong, Tiankuo Yao, Nang Yuan, Jiapeng Luo, Lewei Lu, Xin Lou
TL;DR
This work tackles open-web, long-horizon information seeking where learning is hindered by noisy observations and sparse terminal rewards. It proposes a visual-grounded framework that renders webpages as snapshots and introduces Information-Aware Credit Assignment (ICA), a post-hoc method that estimates the utility of each acquired information unit and propagates dense turn-level learning signals back to the responsible retrieval turns. ICA operates within a GRPO-based training pipeline and demonstrates consistent gains over text-based baselines across diverse benchmarks, showing that information-level credit attribution can overcome credit-assignment bottlenecks in noisy, real-world web environments. The approach improves data efficiency and robustness for information-seeking agents, with potential impact on QA, research assistance, and domain-specific discovery tasks, and is supported by publicly available code and datasets.
Abstract
Despite the strong performance achieved by reinforcement learning-trained information-seeking agents, learning in open-ended web environments remains severely constrained by low signal-to-noise feedback. Text-based parsers often discard layout semantics and introduce unstructured noise, while long-horizon training typically relies on sparse outcome rewards that obscure which retrieval actions actually matter. We propose a visual-native search framework that represents webpages as visual snapshots, allowing agents to leverage layout cues to quickly localize salient evidence and suppress distractors. To learn effectively from these high-dimensional observations, we introduce Information-Aware Credit Assignment (ICA), a post-hoc method that estimates each retrieved snapshot's contribution to the final outcome via posterior analysis and propagates dense learning signals back to key search turns. Integrated with a GRPO-based training pipeline, our approach consistently outperforms text-based baselines on diverse information-seeking benchmarks, providing evidence that visual snapshot grounding with information-level credit assignment alleviates the credit-assignment bottleneck in open-ended web environments. The code and datasets will be released in https://github.com/pc-inno/ICA_MM_deepsearch.git.
