What's the next frontier for Data-centric AI? Data Savvy Agents
Nabeel Seedat, Jiashuo Liu, Mihaela van der Schaar
TL;DR
The paper addresses the gap between current AI agents and real-world deployment by focusing on data handling as a core capability. It proposes four capabilities—proactive data acquisition, sophisticated data processing, interactive test data synthesis, and continual adaptation—to enable agents to autonomously acquire, refine, and evolve their knowledge in dynamic environments. The work outlines concrete research directions, discusses real-world impacts, and examines alternative viewpoints, emphasizing data as the driver of reliable, scalable agent systems. This approach has practical implications for building autonomous, self-improving agents across domains while highlighting risks and the need for responsible deployment.
Abstract
The recent surge in AI agents that autonomously communicate, collaborate with humans and use diverse tools has unlocked promising opportunities in various real-world settings. However, a vital aspect remains underexplored: how agents handle data. Scalable autonomy demands agents that continuously acquire, process, and evolve their data. In this paper, we argue that data-savvy capabilities should be a top priority in the design of agentic systems to ensure reliable real-world deployment. Specifically, we propose four key capabilities to realize this vision: (1) Proactive data acquisition: enabling agents to autonomously gather task-critical knowledge or solicit human input to address data gaps; (2) Sophisticated data processing: requiring context-aware and flexible handling of diverse data challenges and inputs; (3) Interactive test data synthesis: shifting from static benchmarks to dynamically generated interactive test data for agent evaluation; and (4) Continual adaptation: empowering agents to iteratively refine their data and background knowledge to adapt to shifting environments. While current agent research predominantly emphasizes reasoning, we hope to inspire a reflection on the role of data-savvy agents as the next frontier in data-centric AI.
