Towards Agentic Schema Refinement
Agapi Rissaki, Ilias Fountalis, Nikolaos Vasiloglou, Wolfgang Gatterbauer
TL;DR
The paper tackles the challenge of analyzing large, messy enterprise databases by introducing a semantic layer composed of semantically distilled views. It presents an agentic, multi-agent LLM workflow that performs schema refinement and view discovery through roles of Analyst, Critic, and Verifier, aided by GraphRAG-guided sampling and external tooling. The approach yields a large set of reusable views and an ER-model that maps discovered entities and relationships, demonstrated on a Braze dataset with notable coverage and cross-source connections. The results suggest that this semantic-layer workflow can simplify data exploration and potentially enhance text-to-SQL systems, with future work focusing on deeper integration and interactive exploration interfaces.
Abstract
Large enterprise databases can be complex and messy, obscuring the data semantics needed for analytical tasks. We propose a semantic layer in-between the database and the user as a set of small and easy-to-interpret database views, effectively acting as a refined version of the schema. To discover these views, we introduce a multi-agent Large Language Model (LLM) simulation where LLM agents collaborate to iteratively define and refine views with minimal input. Our approach paves the way for LLM-powered exploration of unwieldy databases.
