Table of Contents
Fetching ...

Continual Learning of Domain Knowledge from Human Feedback in Text-to-SQL

Thomas Cook, Kelly Patel, Sivapriya Vellaichamy, Udari Madhushani Sehwag, Saba Rahimi, Zhen Zeng, Sumitra Ganesh

TL;DR

The paper tackles the problem of tacit domain knowledge missing from formal documentation in Text-to-SQL by introducing a continual-learning framework driven by human feedback. It proposes a memory-augmented Learning Agent with four granular memory levels and two modes of procedural reasoning, complemented by a Human Proxy Agent for scalable feedback. Empirical results on the BIRD Dev benchmark show that memory enhancements—especially with Procedural Agent configurations—improve execution accuracy and generalization to new questions, while distilling actionable tacit knowledge across interactions. The work provides a practical blueprint for adaptive, domain-aware text-to-SQL systems that continually learn from human input, with implications for broader structured-reasoning tasks.

Abstract

Large Language Models (LLMs) can generate SQL queries from natural language questions but struggle with database-specific schemas and tacit domain knowledge. We introduce a framework for continual learning from human feedback in text-to-SQL, where a learning agent receives natural language feedback to refine queries and distills the revealed knowledge for reuse on future tasks. This distilled knowledge is stored in a structured memory, enabling the agent to improve execution accuracy over time. We design and evaluate multiple variations of a learning agent architecture that vary in how they capture and retrieve past experiences. Experiments on the BIRD benchmark Dev set show that memory-augmented agents, particularly the Procedural Agent, achieve significant accuracy gains and error reduction by leveraging human-in-the-loop feedback. Our results highlight the importance of transforming tacit human expertise into reusable knowledge, paving the way for more adaptive, domain-aware text-to-SQL systems that continually learn from a human-in-the-loop.

Continual Learning of Domain Knowledge from Human Feedback in Text-to-SQL

TL;DR

The paper tackles the problem of tacit domain knowledge missing from formal documentation in Text-to-SQL by introducing a continual-learning framework driven by human feedback. It proposes a memory-augmented Learning Agent with four granular memory levels and two modes of procedural reasoning, complemented by a Human Proxy Agent for scalable feedback. Empirical results on the BIRD Dev benchmark show that memory enhancements—especially with Procedural Agent configurations—improve execution accuracy and generalization to new questions, while distilling actionable tacit knowledge across interactions. The work provides a practical blueprint for adaptive, domain-aware text-to-SQL systems that continually learn from human input, with implications for broader structured-reasoning tasks.

Abstract

Large Language Models (LLMs) can generate SQL queries from natural language questions but struggle with database-specific schemas and tacit domain knowledge. We introduce a framework for continual learning from human feedback in text-to-SQL, where a learning agent receives natural language feedback to refine queries and distills the revealed knowledge for reuse on future tasks. This distilled knowledge is stored in a structured memory, enabling the agent to improve execution accuracy over time. We design and evaluate multiple variations of a learning agent architecture that vary in how they capture and retrieve past experiences. Experiments on the BIRD benchmark Dev set show that memory-augmented agents, particularly the Procedural Agent, achieve significant accuracy gains and error reduction by leveraging human-in-the-loop feedback. Our results highlight the importance of transforming tacit human expertise into reusable knowledge, paving the way for more adaptive, domain-aware text-to-SQL systems that continually learn from a human-in-the-loop.

Paper Structure

This paper contains 52 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Overview of the two settings used for testing. In the online setting (left), the Learning Agent takes a natural language question and schema as input, generates a candidate SQL query, and iteratively refines it based on feedback from a human proxy agent. The final correct SQL and the distilled knowledge from this interaction are then stored in memory. In the offline phase (right), the Learning Agent generates SQL queries based on the natural language question, database schema, and the memory store (if available), without further feedback. The Learning Agent progressively improves execution accuracy by augmenting its memory.
  • Figure 2: Levels of Memory Granularity
  • Figure 3: Execution accuracy by database for the Initial (NP-0, no memory), Baseline (NP-0), and Procedural Agent (PA, Agent label P-3).
  • Figure 4: Execution accuracy as a function of the number of online instances used to construct the memory store. "Baseline" corresponds to the non-procedural agent (NP-0), and "PA" corresponds to the full Procedural Agent (P-3). Error bars indicate $\pm$1 standard deviation. "Evidence Coverage" denotes the proportion of test questions that have at least one corresponding question in the memory store whose annotated evidence field has cosine similarity $\geq$ 0.9 to that of the test question.
  • Figure 5: Breakdown of error types for each agent configuration. Bars show the total counts of common error categories across different agents.
  • ...and 1 more figures