Table of Contents
Fetching ...

Learning a Neural Semantic Parser from User Feedback

Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Krishnamurthy, Luke Zettlemoyer

TL;DR

<3-5 sentence high-level summary> The paper tackles building natural language interfaces to databases (NLIDBs) for new domains by directly mapping user utterances to SQL with neural sequence models, enabling a minimal-intervention online feedback loop. It introduces data augmentation (schema templates and PPDB paraphrasing) and entity anonymization to train robust parsers that improve through real user and crowd feedback, demonstrated on Geo880, ATIS, and an academic-domain setting. The authors show competitive performance without database-specific engineering and release SCHOLAR, a new dataset to facilitate future research in interactive, SQL-based semantic parsing. This approach promises rapid deployment of NLIDBs across domains by leveraging crowd-labeled SQL annotations and online learning signals, reducing annotation burden while improving accuracy over time.

Abstract

We present an approach to rapidly and easily build natural language interfaces to databases for new domains, whose performance improves over time based on user feedback, and requires minimal intervention. To achieve this, we adapt neural sequence models to map utterances directly to SQL with its full expressivity, bypassing any intermediate meaning representations. These models are immediately deployed online to solicit feedback from real users to flag incorrect queries. Finally, the popularity of SQL facilitates gathering annotations for incorrect predictions using the crowd, which is directly used to improve our models. This complete feedback loop, without intermediate representations or database specific engineering, opens up new ways of building high quality semantic parsers. Experiments suggest that this approach can be deployed quickly for any new target domain, as we show by learning a semantic parser for an online academic database from scratch.

Learning a Neural Semantic Parser from User Feedback

TL;DR

<3-5 sentence high-level summary> The paper tackles building natural language interfaces to databases (NLIDBs) for new domains by directly mapping user utterances to SQL with neural sequence models, enabling a minimal-intervention online feedback loop. It introduces data augmentation (schema templates and PPDB paraphrasing) and entity anonymization to train robust parsers that improve through real user and crowd feedback, demonstrated on Geo880, ATIS, and an academic-domain setting. The authors show competitive performance without database-specific engineering and release SCHOLAR, a new dataset to facilitate future research in interactive, SQL-based semantic parsing. This approach promises rapid deployment of NLIDBs across domains by leveraging crowd-labeled SQL annotations and online learning signals, reducing annotation burden while improving accuracy over time.

Abstract

We present an approach to rapidly and easily build natural language interfaces to databases for new domains, whose performance improves over time based on user feedback, and requires minimal intervention. To achieve this, we adapt neural sequence models to map utterances directly to SQL with its full expressivity, bypassing any intermediate meaning representations. These models are immediately deployed online to solicit feedback from real users to flag incorrect queries. Finally, the popularity of SQL facilitates gathering annotations for incorrect predictions using the crowd, which is directly used to improve our models. This complete feedback loop, without intermediate representations or database specific engineering, opens up new ways of building high quality semantic parsers. Experiments suggest that this approach can be deployed quickly for any new target domain, as we show by learning a semantic parser for an online academic database from scratch.

Paper Structure

This paper contains 19 sections, 4 equations, 3 figures, 7 tables, 1 algorithm.

Figures (3)

  • Figure 1: Utterances with corresponding SQL queries to answer them for two domains, an academic database and a flight reservation database.
  • Figure 2: (a) Example schema template consisting of a question and SQL query with slots to be filled with database entities, columns, and values; (b) Entity-anonymized training example generated by applying the template to an academic database.
  • Figure 3: Accuracy as a function of batch number in simulated interactive learning experiments on Geo880 (top) and ATIS (bottom).