Hybrid Ranking Network for Text-to-SQL
Qin Lyu, Kaushik Chakrabarti, Shobhit Hathi, Souvik Kundu, Jianwen Zhang, Zheng Chen
TL;DR
This paper tackles Text-to-SQL by highlighting the inefficiency of prior approaches that concatenated all table columns with the natural language query when using pre-trained transformers. It introduces HydraNet, a column-wise, two-stage framework that (i) encodes one question-column pair at a time to exploit base LM training on sentence pairs, (ii) performs multi-task column ranking for SELECT, WHERE, and relevance, and (iii) uses straightforward rule-based assembly to form the final SQL, augmented by execution-guided decoding to ensure runtime validity. The model achieves state-of-the-art results on WikiSQL, notably with RoBERTa-Large plus EG, demonstrating strong performance gains from aligning the encoder inputs with the LM’s pre-training and from integrating a robust column-ranking strategy with run-time validation. The approach offers a scalable, efficient path to leveraging large pre-trained transformers for NLIDB tasks and points to extending HydraNet to support full SQL grammar in future work.
Abstract
In this paper, we study how to leverage pre-trained language models in Text-to-SQL. We argue that previous approaches under utilize the base language models by concatenating all columns together with the NL question and feeding them into the base language model in the encoding stage. We propose a neat approach called Hybrid Ranking Network (HydraNet) which breaks down the problem into column-wise ranking and decoding and finally assembles the column-wise outputs into a SQL query by straightforward rules. In this approach, the encoder is given a NL question and one individual column, which perfectly aligns with the original tasks BERT/RoBERTa is trained on, and hence we avoid any ad-hoc pooling or additional encoding layers which are necessary in prior approaches. Experiments on the WikiSQL dataset show that the proposed approach is very effective, achieving the top place on the leaderboard.
