Table of Contents
Fetching ...

Towards Agentic Schema Refinement

Agapi Rissaki, Ilias Fountalis, Nikolaos Vasiloglou, Wolfgang Gatterbauer

TL;DR

The paper tackles the challenge of analyzing large, messy enterprise databases by introducing a semantic layer composed of semantically distilled views. It presents an agentic, multi-agent LLM workflow that performs schema refinement and view discovery through roles of Analyst, Critic, and Verifier, aided by GraphRAG-guided sampling and external tooling. The approach yields a large set of reusable views and an ER-model that maps discovered entities and relationships, demonstrated on a Braze dataset with notable coverage and cross-source connections. The results suggest that this semantic-layer workflow can simplify data exploration and potentially enhance text-to-SQL systems, with future work focusing on deeper integration and interactive exploration interfaces.

Abstract

Large enterprise databases can be complex and messy, obscuring the data semantics needed for analytical tasks. We propose a semantic layer in-between the database and the user as a set of small and easy-to-interpret database views, effectively acting as a refined version of the schema. To discover these views, we introduce a multi-agent Large Language Model (LLM) simulation where LLM agents collaborate to iteratively define and refine views with minimal input. Our approach paves the way for LLM-powered exploration of unwieldy databases.

Towards Agentic Schema Refinement

TL;DR

The paper tackles the challenge of analyzing large, messy enterprise databases by introducing a semantic layer composed of semantically distilled views. It presents an agentic, multi-agent LLM workflow that performs schema refinement and view discovery through roles of Analyst, Critic, and Verifier, aided by GraphRAG-guided sampling and external tooling. The approach yields a large set of reusable views and an ER-model that maps discovered entities and relationships, demonstrated on a Braze dataset with notable coverage and cross-source connections. The results suggest that this semantic-layer workflow can simplify data exploration and potentially enhance text-to-SQL systems, with future work focusing on deeper integration and interactive exploration interfaces.

Abstract

Large enterprise databases can be complex and messy, obscuring the data semantics needed for analytical tasks. We propose a semantic layer in-between the database and the user as a set of small and easy-to-interpret database views, effectively acting as a refined version of the schema. To discover these views, we introduce a multi-agent Large Language Model (LLM) simulation where LLM agents collaborate to iteratively define and refine views with minimal input. Our approach paves the way for LLM-powered exploration of unwieldy databases.

Paper Structure

This paper contains 11 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Example of the schema refinement mechanism on the schema orders( order_id, staff_id, total_price, $\dots$), staff( staff_id, position, $\dots$), where each order is handled by a staff member. By distilling the views intern and staff_generates_revenue, the query becomes successively more refined.
  • Figure 2: A single chat session implementing the schema refinement mechanism (Figure (a)). A sequence of schema refinement chat sessions. Retaining memory from previous sessions helps promote view reusability (by using previously defined views in new tasks) and diversity (by avoiding defining the same views or working on similar tasks) (Figure (b)).
  • Figure 3: Structural properties of the distilled views composing the semantic layer. We ignore the top 1% views in terms of width.
  • Figure 4: Diagram of entities (with their attributes) and relationships.