Table of Contents
Fetching ...

A Picture of Agentic Search

Francesca Pezzuti, Ophir Frieder, Fabrizio Silvestri, Sean MacAvaney, Nicola Tonellotto

TL;DR

A methodology for collecting all the data produced and consumed by agentic retrieval-augmented systems when answering queries is developed, and the Agentic Search Queryset (ASQ) dataset is released.

Abstract

With automated systems increasingly issuing search queries alongside humans, Information Retrieval (IR) faces a major shift. Yet IR remains human-centred, with systems, evaluation metrics, user models, and datasets designed around human queries and behaviours. Consequently, IR operates under assumptions that no longer hold in practice, with changes to workload volumes, predictability, and querying behaviours. This misalignment affects system performance and optimisation: caching may lose effectiveness, query pre-processing may add overhead without improving results, and standard metrics may mismeasure satisfaction. Without adaptation, retrieval models risk satisfying neither humans, nor the emerging user segment of agents. However, datasets capturing agent search behaviour are lacking, which is a critical gap given IR's historical reliance on data-driven evaluation and optimisation. We develop a methodology for collecting all the data produced and consumed by agentic retrieval-augmented systems when answering queries, and we release the Agentic Search Queryset (ASQ) dataset. ASQ contains reasoning-induced queries, retrieved documents, and thoughts for queries in HotpotQA, Researchy Questions, and MS MARCO, for 3 diverse agents and 2 retrieval pipelines. The accompanying toolkit enables ASQ to be extended to new agents, retrievers, and datasets.

A Picture of Agentic Search

TL;DR

A methodology for collecting all the data produced and consumed by agentic retrieval-augmented systems when answering queries is developed, and the Agentic Search Queryset (ASQ) dataset is released.

Abstract

With automated systems increasingly issuing search queries alongside humans, Information Retrieval (IR) faces a major shift. Yet IR remains human-centred, with systems, evaluation metrics, user models, and datasets designed around human queries and behaviours. Consequently, IR operates under assumptions that no longer hold in practice, with changes to workload volumes, predictability, and querying behaviours. This misalignment affects system performance and optimisation: caching may lose effectiveness, query pre-processing may add overhead without improving results, and standard metrics may mismeasure satisfaction. Without adaptation, retrieval models risk satisfying neither humans, nor the emerging user segment of agents. However, datasets capturing agent search behaviour are lacking, which is a critical gap given IR's historical reliance on data-driven evaluation and optimisation. We develop a methodology for collecting all the data produced and consumed by agentic retrieval-augmented systems when answering queries, and we release the Agentic Search Queryset (ASQ) dataset. ASQ contains reasoning-induced queries, retrieved documents, and thoughts for queries in HotpotQA, Researchy Questions, and MS MARCO, for 3 diverse agents and 2 retrieval pipelines. The accompanying toolkit enables ASQ to be extended to new agents, retrievers, and datasets.
Paper Structure (17 sections, 2 equations, 2 figures, 2 tables)

This paper contains 17 sections, 2 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Transition probability matrices representing human (left) and agent search behaviours (centre, right). Rows correspond to the current state and columns to the next state.
  • Figure 2: Qwen-7B's distribution of transition probabilities across consecutive iterations. Left: retrieval only. Right: retrieval and re-ranking. Stack $i$ shows the transition probabilities from iteration $(i-1)$ to $i$; outlier iterations are omitted.