Table of Contents
Fetching ...

ICA: Information-Aware Credit Assignment for Visually Grounded Long-Horizon Information-Seeking Agents

Cong Pang, Xuyu Feng, Yujie Yi, Zixuan Chen, Jiawei Hong, Tiankuo Yao, Nang Yuan, Jiapeng Luo, Lewei Lu, Xin Lou

TL;DR

This work tackles open-web, long-horizon information seeking where learning is hindered by noisy observations and sparse terminal rewards. It proposes a visual-grounded framework that renders webpages as snapshots and introduces Information-Aware Credit Assignment (ICA), a post-hoc method that estimates the utility of each acquired information unit and propagates dense turn-level learning signals back to the responsible retrieval turns. ICA operates within a GRPO-based training pipeline and demonstrates consistent gains over text-based baselines across diverse benchmarks, showing that information-level credit attribution can overcome credit-assignment bottlenecks in noisy, real-world web environments. The approach improves data efficiency and robustness for information-seeking agents, with potential impact on QA, research assistance, and domain-specific discovery tasks, and is supported by publicly available code and datasets.

Abstract

Despite the strong performance achieved by reinforcement learning-trained information-seeking agents, learning in open-ended web environments remains severely constrained by low signal-to-noise feedback. Text-based parsers often discard layout semantics and introduce unstructured noise, while long-horizon training typically relies on sparse outcome rewards that obscure which retrieval actions actually matter. We propose a visual-native search framework that represents webpages as visual snapshots, allowing agents to leverage layout cues to quickly localize salient evidence and suppress distractors. To learn effectively from these high-dimensional observations, we introduce Information-Aware Credit Assignment (ICA), a post-hoc method that estimates each retrieved snapshot's contribution to the final outcome via posterior analysis and propagates dense learning signals back to key search turns. Integrated with a GRPO-based training pipeline, our approach consistently outperforms text-based baselines on diverse information-seeking benchmarks, providing evidence that visual snapshot grounding with information-level credit assignment alleviates the credit-assignment bottleneck in open-ended web environments. The code and datasets will be released in https://github.com/pc-inno/ICA_MM_deepsearch.git.

ICA: Information-Aware Credit Assignment for Visually Grounded Long-Horizon Information-Seeking Agents

TL;DR

This work tackles open-web, long-horizon information seeking where learning is hindered by noisy observations and sparse terminal rewards. It proposes a visual-grounded framework that renders webpages as snapshots and introduces Information-Aware Credit Assignment (ICA), a post-hoc method that estimates the utility of each acquired information unit and propagates dense turn-level learning signals back to the responsible retrieval turns. ICA operates within a GRPO-based training pipeline and demonstrates consistent gains over text-based baselines across diverse benchmarks, showing that information-level credit attribution can overcome credit-assignment bottlenecks in noisy, real-world web environments. The approach improves data efficiency and robustness for information-seeking agents, with potential impact on QA, research assistance, and domain-specific discovery tasks, and is supported by publicly available code and datasets.

Abstract

Despite the strong performance achieved by reinforcement learning-trained information-seeking agents, learning in open-ended web environments remains severely constrained by low signal-to-noise feedback. Text-based parsers often discard layout semantics and introduce unstructured noise, while long-horizon training typically relies on sparse outcome rewards that obscure which retrieval actions actually matter. We propose a visual-native search framework that represents webpages as visual snapshots, allowing agents to leverage layout cues to quickly localize salient evidence and suppress distractors. To learn effectively from these high-dimensional observations, we introduce Information-Aware Credit Assignment (ICA), a post-hoc method that estimates each retrieved snapshot's contribution to the final outcome via posterior analysis and propagates dense learning signals back to key search turns. Integrated with a GRPO-based training pipeline, our approach consistently outperforms text-based baselines on diverse information-seeking benchmarks, providing evidence that visual snapshot grounding with information-level credit assignment alleviates the credit-assignment bottleneck in open-ended web environments. The code and datasets will be released in https://github.com/pc-inno/ICA_MM_deepsearch.git.
Paper Structure (44 sections, 16 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 44 sections, 16 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: ICA Framework. On the left, an information-seeking agent alternates Reasoning-Action turns, invoking web search and fetch URL to acquire external evidence before producing a final answer. Given a batch of rollout trajectories $\{\tau_i\}$ with sparse outcome supervision, we group interactions by the webpage snapshots obtained from each visited website. We then estimate the marginal utility of each acquired webpage content by its association with successful outcomes, and propagate this signal back to the turns that revealed the content, yielding dense turn-level rewards $\tilde{r}_{i,t}$ for long-horizon credit assignment.
  • Figure 2: Comparison of text-based RAG and snapshot-based webpage acquisition. Text extraction loses table structure and adds noise, leading to inconsistent evidence across trajectories. Snapshots preserve layout cues for reliable table reading and more stable information units for ICA credit assignment.
  • Figure 3: Text vs Visual snapshots. (a) An example showing that HTML text is noisy and loses visual structure, while snapshots preserve layout and non-textual cues. (b) Token number comparison across four urls, showing snapshots reduce token usage by 27.0--65.6% compared to parsed text.
  • Figure :
  • Figure :