Table of Contents
Fetching ...

Exploring Database Normalization Effects on SQL Generation

Ryosuke Kohita

TL;DR

This work addresses a gap in NL2SQL research by systematically evaluating how database normalization—ranging from formal $1NF$/$2NF$/$3NF$ to practical LOW/MID/HIGH variants—affects SQL generation across eight LLMs. Using a ladder of experiments from controlled synthetic data to real-world Semantic Scholar-like data, the study finds that denormalized schemas excel for retrieval tasks in synthetic settings, while normalized schemas yield stronger performance for aggregation and for real-world data. The results demonstrate a clear query-dependent trade-off and suggest practical strategies such as workload-aware schema variants and adaptive schema routing to improve NL2SQL interfaces. These insights bridge database design and NL2SQL, guiding the development of adaptive, schema-aware systems for real-world database access.

Abstract

Schema design, particularly normalization, is a critical yet often overlooked factor in natural language to SQL (NL2SQL) systems. Most prior research evaluates models on fixed schemas, overlooking the influence of design on performance. We present the first systematic study of schema normalization's impact, evaluating eight leading large language models on synthetic and real-world datasets with varied normalization levels. We construct controlled synthetic datasets with formal normalization (1NF-3NF) and real academic paper datasets with practical schemes. Our results show that denormalized schemas offer high accuracy on simple retrieval queries, even with cost-effective models in zero-shot settings. In contrast, normalized schemas (2NF/3NF) introduce challenges such as errors in base table selection and join type prediction; however, these issues are substantially mitigated by providing few-shot examples. For aggregation queries, normalized schemas yielded better performance, mainly due to their robustness against the data duplication and NULL value issues that cause errors in denormalized schemas. These findings suggest that the optimal schema design for NL2SQL applications depends on the types of queries to be supported. Our study demonstrates the importance of considering schema design when developing NL2SQL interfaces and integrating adaptive schema selection for real-world scenarios.

Exploring Database Normalization Effects on SQL Generation

TL;DR

This work addresses a gap in NL2SQL research by systematically evaluating how database normalization—ranging from formal // to practical LOW/MID/HIGH variants—affects SQL generation across eight LLMs. Using a ladder of experiments from controlled synthetic data to real-world Semantic Scholar-like data, the study finds that denormalized schemas excel for retrieval tasks in synthetic settings, while normalized schemas yield stronger performance for aggregation and for real-world data. The results demonstrate a clear query-dependent trade-off and suggest practical strategies such as workload-aware schema variants and adaptive schema routing to improve NL2SQL interfaces. These insights bridge database design and NL2SQL, guiding the development of adaptive, schema-aware systems for real-world database access.

Abstract

Schema design, particularly normalization, is a critical yet often overlooked factor in natural language to SQL (NL2SQL) systems. Most prior research evaluates models on fixed schemas, overlooking the influence of design on performance. We present the first systematic study of schema normalization's impact, evaluating eight leading large language models on synthetic and real-world datasets with varied normalization levels. We construct controlled synthetic datasets with formal normalization (1NF-3NF) and real academic paper datasets with practical schemes. Our results show that denormalized schemas offer high accuracy on simple retrieval queries, even with cost-effective models in zero-shot settings. In contrast, normalized schemas (2NF/3NF) introduce challenges such as errors in base table selection and join type prediction; however, these issues are substantially mitigated by providing few-shot examples. For aggregation queries, normalized schemas yielded better performance, mainly due to their robustness against the data duplication and NULL value issues that cause errors in denormalized schemas. These findings suggest that the optimal schema design for NL2SQL applications depends on the types of queries to be supported. Our study demonstrates the importance of considering schema design when developing NL2SQL interfaces and integrating adaptive schema selection for real-world scenarios.

Paper Structure

This paper contains 26 sections, 5 figures, 8 tables.

Figures (5)

  • Figure 1: Overview of our motivation, methodology, and key finding—query-schema preferences. These preferences were consistently observed across experiments on leading eight LLMs.
  • Figure 2: Normalization examples in 1NF, 2NF, and 3NF: class registrations with entities Student, Class, Department (Dep), and Division (Div). Functional dependencies: $\text{Class} \rightarrow \text{Dep}$ and $\text{Dep} \rightarrow \text{Div}$. Bold columns are primary keys, and the others are non-key attributes.
  • Figure 3: Per-query-type execution accuracy for 1NF / 2NF / 3NF in zero-shot settings. Top: Formal-Basic; Bottom: Formal-Simulated.
  • Figure 4: Execution accuracy for Retrieval (R) and Aggregation (A) queries at normalization levels (LOW, MID, HIGH) in Practical-Real.
  • Figure 5: (Left) Execution accuracy in Practical-Real with varying few-shot examples. (Right-Top) Execution times with different data volumes in Formal-Basic. (Right-Bottom) Execution times by query type in Practical-Real.