Table of Contents
Fetching ...

Infusing Prompts with Syntax and Semantics

Anton Bulle Labate, Fabio Gagliardi Cozman

TL;DR

The paper tackles NL2SQL for low-resource languages by infusing prompts with linguistic structure rather than altering model internals. It demonstrates that concatenating dependency-tree syntax and AMR semantics into prompts can significantly boost translation accuracy and even surpass prior state-of-the-art on Portuguese and French Spider variants, while also improving training efficiency. Using multilingual Spider data (French, Portuguese, Spanish, Chinese) and models like Bart, T5, and mT5 within the RESDSQL framework, the study shows AMR-based prompts often yield the strongest gains across languages. These results highlight the practical potential of linguistic prompting to reduce data and compute requirements while maintaining or improving performance, and they point to broader applications beyond NL2SQL.

Abstract

Despite impressive success, language models often generate outputs with flawed linguistic structure. We analyze the effect of directly infusing various kinds of syntactic and semantic information into large language models. To demonstrate the value of our proposals, we focus on the translation of natural language queries to SQL, in particular dealing with languages with less resources than English, to better investigate how much help we can get from low cost syntactic and semantic information. We show that linguistic analysis can significantly boost language models, to the point that we have surpassed previous best systems.

Infusing Prompts with Syntax and Semantics

TL;DR

The paper tackles NL2SQL for low-resource languages by infusing prompts with linguistic structure rather than altering model internals. It demonstrates that concatenating dependency-tree syntax and AMR semantics into prompts can significantly boost translation accuracy and even surpass prior state-of-the-art on Portuguese and French Spider variants, while also improving training efficiency. Using multilingual Spider data (French, Portuguese, Spanish, Chinese) and models like Bart, T5, and mT5 within the RESDSQL framework, the study shows AMR-based prompts often yield the strongest gains across languages. These results highlight the practical potential of linguistic prompting to reduce data and compute requirements while maintaining or improving performance, and they point to broader applications beyond NL2SQL.

Abstract

Despite impressive success, language models often generate outputs with flawed linguistic structure. We analyze the effect of directly infusing various kinds of syntactic and semantic information into large language models. To demonstrate the value of our proposals, we focus on the translation of natural language queries to SQL, in particular dealing with languages with less resources than English, to better investigate how much help we can get from low cost syntactic and semantic information. We show that linguistic analysis can significantly boost language models, to the point that we have surpassed previous best systems.

Paper Structure

This paper contains 8 sections, 1 figure, 5 tables.

Figures (1)

  • Figure 1: The pipeline, from input to output.