Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First
Shu Liu, Soujanya Ponnapalli, Shreya Shankar, Sepanta Zeighami, Alan Zhu, Shubham Agarwal, Ruiqi Chen, Samion Suwito, Shuo Yuan, Ion Stoica, Matei Zaharia, Alvin Cheung, Natacha Crooks, Joseph E. Gonzalez, Aditya G. Parameswaran
TL;DR
The paper argues that future data systems must be redesigned to natively support agentic workloads driven by LLM agents. It introduces an agent-first architecture with probes, an in-database interpreter, a probe optimizer, and an agentic memory store to enable high-throughput exploration, grounding, and branching. Case studies show that agentic speculation can improve accuracy and reduce effort through redundancy sharing and grounding hints. The work outlines challenges and opportunities in interface design, query processing, and storage, charting a path toward scalable, steerable data systems for AI-powered decision making.
Abstract
Large Language Model (LLM) agents, acting on their users' behalf to manipulate and analyze data, are likely to become the dominant workload for data systems in the future. When working with data, agents employ a high-throughput process of exploration and solution formulation for the given task, one we call agentic speculation. The sheer volume and inefficiencies of agentic speculation can pose challenges for present-day data systems. We argue that data systems need to adapt to more natively support agentic workloads. We take advantage of the characteristics of agentic speculation that we identify, i.e., scale, heterogeneity, redundancy, and steerability - to outline a number of new research opportunities for a new agent-first data systems architecture, ranging from new query interfaces, to new query processing techniques, to new agentic memory stores.
