Beyond Text-to-SQL: Autonomous Research-Driven Database Exploration with DAR
Ostap Vykhopen, Viktoria Skorik, Maxim Tereschenko, Veronika Solopova
TL;DR
DAR introduces a three-layer, in-database autonomous exploration framework that coordinates specialized agents to formulate questions, generate and validate SQL+AI queries, and synthesize reports entirely inside BigQuery. By leveraging ADK-based orchestration and native generative AI functions, it eliminates data movement and preserves governance. In a comparative study against professional analysts on an asset–incident dataset, DAR achieves substantial speedups while providing pattern-based insights, highlighting the practical potential of autonomous research-driven data exploration in cloud warehouses. The work demonstrates a compelling shift from query-driven assistance to proactive, research-driven data discovery, with human experts remaining essential for deep interpretation and validation.
Abstract
Large language models can already query databases, yet most existing systems remain reactive: they rely on explicit user prompts and do not actively explore data. We introduce DAR (Data Agnostic Researcher), a multi-agent system that performs end-to-end database research without human-initiated queries. DAR orchestrates specialized AI agents across three layers: initialization (intent inference and metadata extraction), execution (SQL and AI-based query synthesis with iterative validation), and synthesis (report generation with built-in quality control). All reasoning is executed directly inside BigQuery using native generative AI functions, eliminating data movement and preserving data governance. On a realistic asset-incident dataset, DAR completes the full analytical task in 16 minutes, compared to 8.5 hours for a professional analyst (approximately 32x times faster), while producing useful pattern-based insights and evidence-grounded recommendations. Although human experts continue to offer deeper contextual interpretation, DAR excels at rapid exploratory analysis. Overall, this work shifts database interaction from query-driven assistance toward autonomous, research-driven exploration within cloud data warehouses.
