Insights into Natural Language Database Query Errors: From Attention Misalignment to User Handling Strategies

Zheng Ning; Yuan Tian; Zheng Zhang; Tianyi Zhang; Toby Li

Insights into Natural Language Database Query Errors: From Attention Misalignment to User Handling Strategies

Zheng Ning, Yuan Tian, Zheng Zhang, Tianyi Zhang, Toby Li

TL;DR

The paper addresses the persistent gap in understanding NL2SQL errors beyond aggregate accuracy by building a cross-model taxonomy of errors on Spider-derived data. It analyzes model-human attention alignment to identify a likely cause of SQL errors and conducts a controlled user study to evaluate three interactive error-handling approaches. The results reveal a rich set of error types (48 ultimately) and show that attention misalignment correlates with errors, while current interactive mechanisms provide limited gains in accuracy or speed on challenging cross-domain tasks. These findings motivate mixed-initiative, attention-aware design and adaptive interfaces to improve real-world NL2SQL data querying tools.

Abstract

Querying structured databases with natural language (NL2SQL) has remained a difficult problem for years. Recently, the advancement of machine learning (ML), natural language processing (NLP), and large language models (LLM) have led to significant improvements in performance, with the best model achieving ~85% percent accuracy on the benchmark Spider dataset. However, there is a lack of a systematic understanding of the types, causes, and effectiveness of error-handling mechanisms of errors for erroneous queries nowadays. To bridge the gap, a taxonomy of errors made by four representative NL2SQL models was built in this work, along with an in-depth analysis of the errors. Second, the causes of model errors were explored by analyzing the model-human attention alignment to the natural language query. Last, a within-subjects user study with 26 participants was conducted to investigate the effectiveness of three interactive error-handling mechanisms in NL2SQL. Findings from this paper shed light on the design of model structure and error discovery and repair strategies for natural language data query interfaces in the future.

Insights into Natural Language Database Query Errors: From Attention Misalignment to User Handling Strategies

TL;DR

Abstract

Paper Structure (49 sections, 5 equations, 6 figures, 20 tables)

This paper contains 49 sections, 5 equations, 6 figures, 20 tables.

Introduction
Related Work
NL2SQL techniques
Detecting and repairing errors for NL2SQL
Error handling via human-AI collaboration
An Analysis and Taxonomy of NL2SQL Errors
Model selection
Erroneous queries dataset collection
The coding procedure
Step 1: Open coding
Step 2: Iterative refinement of the codebook
Step 3: Coding the remaining dataset
The annotation interface
The Taxonomy of NL2SQL Errors
NL2SQL error analysis
...and 34 more sections

Figures (6)

Figure 1: The user interface that we used for NL2SQL error annotation
Figure 2: The overlap of erroneous queries generated by DIN-SQL+GPT-4, SmBop, and BRIDGE
Figure 3: The distribution of Levenshtein distances between erroneous queries and ground truth queries for each model
Figure 4: The distribution of alignment for correctly and incorrectly solved tasks considering different numbers of keywords.
Figure 5: A comparison of attention alignment between correctly and incorrectly solved tasks.
...and 1 more figures

Insights into Natural Language Database Query Errors: From Attention Misalignment to User Handling Strategies

TL;DR

Abstract

Insights into Natural Language Database Query Errors: From Attention Misalignment to User Handling Strategies

Authors

TL;DR

Abstract

Table of Contents

Figures (6)