TypeSQL: Knowledge-based Type-Aware Neural Text-to-SQL Generation
Tao Yu, Zifan Li, Zilin Zhang, Rui Zhang, Dragomir Radev
TL;DR
<3-5 sentence high-level summary> This work tackles translating natural language questions to SQL queries for unseen databases, using WikiSQL as the evaluation benchmark. It introduces TypeSQL, a knowledge-based, type-aware slot-filling system with a sketch-based SQL representation, a type-recognition preprocessing step, and a dual bi-LSTM input encoder. TypeSQL achieves state-of-the-art results on WikiSQL, including a 5.5% absolute improvement in execute accuracy over prior methods and up to 82.6% when database content is available, highlighting the value of type information and content awareness. The authors note limitations of WikiSQL (e.g., lack of JOIN/GROUP BY) and propose extending the approach to more complex datasets in future work.
Abstract
Interacting with relational databases through natural language helps users of any background easily query and analyze a vast amount of data. This requires a system that understands users' questions and converts them to SQL queries automatically. In this paper we present a novel approach, TypeSQL, which views this problem as a slot filling task. Additionally, TypeSQL utilizes type information to better understand rare entities and numbers in natural language questions. We test this idea on the WikiSQL dataset and outperform the prior state-of-the-art by 5.5% in much less time. We also show that accessing the content of databases can significantly improve the performance when users' queries are not well-formed. TypeSQL gets 82.6% accuracy, a 17.5% absolute improvement compared to the previous content-sensitive model.
