PSM-SQL: Progressive Schema Learning with Multi-granularity Semantics for Text-to-SQL
Zhuopan Yang, Yuanzhen Xie, Ruichao Zhong, Yunzhi Tan, Enjie Liu, Zhenguo Yang, Mochi Gao, Bo Hu, Zang Li
TL;DR
This work tackles the challenge of converting NL questions to SQL across diverse, redundant schemas by introducing PSM-SQL, a framework that progressively links schemas using multi-granularity semantics and a chain loop strategy to prune redundancy. The MSL module operates at column, table, and database levels, employing a column-level triplet loss, a table-level cross-encoder, and database-level LoRA-finetuned LLM reasoning, to filter relevant schemas before SQL generation. Empirical results on Spider and Bird show that reducing schema redundancy and enriching semantic signals improves both schema linking accuracy and SQL execution/EM metrics, with PSM-SQL variants achieving strong gains over prompting- and fine-tuning-based baselines. The approach offers a practical path to robust Text-to-SQL across heterogeneous schemas and domains, highlighting the value of progressive schema optimization in tandem with large-language model reasoning.
Abstract
It is challenging to convert natural language (NL) questions into executable structured query language (SQL) queries for text-to-SQL tasks due to the vast number of database schemas with redundancy, which interferes with semantic learning, and the domain shift between NL and SQL. Existing works for schema linking focus on the table level and perform it once, ignoring the multi-granularity semantics and chainable cyclicity of schemas. In this paper, we propose a progressive schema linking with multi-granularity semantics (PSM-SQL) framework to reduce the redundant database schemas for text-to-SQL. Using the multi-granularity schema linking (MSL) module, PSM-SQL learns the schema semantics at the column, table, and database levels. More specifically, a triplet loss is used at the column level to learn embeddings, while fine-tuning LLMs is employed at the database level for schema reasoning. MSL employs classifier and similarity scores to model schema interactions for schema linking at the table level. In particular, PSM-SQL adopts a chain loop strategy to reduce the task difficulty of schema linking by continuously reducing the number of redundant schemas. Experiments conducted on text-to-SQL datasets show that the proposed PSM-SQL is 1-3 percentage points higher than the existing methods.
