Agent-centric Information Access
Evangelos Kanoulas, Panagiotis Eustratiadis, Yongkang Li, Yougang Lyu, Vaishali Pal, Gabrielle Poerwawinata, Jingfen Qiao, Zihan Wang
TL;DR
This work addresses the problem of information access in a future with millions of domain-specific LLMs by proposing agent-centric information access, where domain experts (knowledge agents) and personalized user agents are dynamically orchestrated via a belief model over expertise $K_1, \dots, K_M$ and the expert pool $L$. It surveys the challenges of expert selection, cross-model answer aggregation, robustness to bias and adversarial manipulation, and evaluation, proposing a scalable framework that leverages retrieval-augmented generation and clustering to simulate thousands of specialized LLMs. A formal evaluation framework is introduced for ranking LLMs, including a reusable training/test collection and an approach to simulate thousands of expert LLMs using clustered document collections and RAG, with new metrics and test designs tailored to model-centric retrieval. The work highlights the practical significance of scalable, cost-aware querying, transparent attribution, and robust aggregation in enabling reliable, scalable multi-LLM information access suitable for deployment at web-scale.
Abstract
As large language models (LLMs) become more specialized, we envision a future where millions of expert LLMs exist, each trained on proprietary data and excelling in specific domains. In such a system, answering a query requires selecting a small subset of relevant models, querying them efficiently, and synthesizing their responses. This paper introduces a framework for agent-centric information access, where LLMs function as knowledge agents that are dynamically ranked and queried based on their demonstrated expertise. Unlike traditional document retrieval, this approach requires inferring expertise on the fly, rather than relying on static metadata or predefined model descriptions. This shift introduces several challenges, including efficient expert selection, cost-effective querying, response aggregation across multiple models, and robustness against adversarial manipulation. To address these issues, we propose a scalable evaluation framework that leverages retrieval-augmented generation and clustering techniques to construct and assess thousands of specialized models, with the potential to scale toward millions.
