Table of Contents
Fetching ...

Leveraging Code to Improve In-context Learning for Semantic Parsing

Ben Bogin, Shivanshu Gupta, Peter Clark, Ashish Sabharwal

TL;DR

This work shows how pre-existing coding abilities of LLMs can be leveraged for semantic parsing by using general-purpose programming languages such as Python instead of DSLs and augmenting prompts with a structured domain description that includes, e.g., the available classes and functions.

Abstract

In-context learning (ICL) is an appealing approach for semantic parsing due to its few-shot nature and improved generalization. However, learning to parse to rare domain-specific languages (DSLs) from just a few demonstrations is challenging, limiting the performance of even the most capable LLMs. In this work, we improve the effectiveness of ICL for semantic parsing by (1) using general-purpose programming languages such as Python instead of DSLs, and (2) augmenting prompts with a structured domain description that includes, e.g., the available classes and functions. We show that both these changes significantly improve accuracy across three popular datasets. Combined, they lead to dramatic improvements (e.g. 7.9% to 66.5% on SMCalFlow compositional split), nearly closing the performance gap between easier i.i.d.\ and harder compositional splits when used with a strong model, and reducing the need for a large number of demonstrations. We find that the resemblance of the target parse language to general-purpose code is a more important factor than the language's popularity in pre-training corpora. Our findings provide an improved methodology for building semantic parsers in the modern context of ICL with LLMs.

Leveraging Code to Improve In-context Learning for Semantic Parsing

TL;DR

This work shows how pre-existing coding abilities of LLMs can be leveraged for semantic parsing by using general-purpose programming languages such as Python instead of DSLs and augmenting prompts with a structured domain description that includes, e.g., the available classes and functions.

Abstract

In-context learning (ICL) is an appealing approach for semantic parsing due to its few-shot nature and improved generalization. However, learning to parse to rare domain-specific languages (DSLs) from just a few demonstrations is challenging, limiting the performance of even the most capable LLMs. In this work, we improve the effectiveness of ICL for semantic parsing by (1) using general-purpose programming languages such as Python instead of DSLs, and (2) augmenting prompts with a structured domain description that includes, e.g., the available classes and functions. We show that both these changes significantly improve accuracy across three popular datasets. Combined, they lead to dramatic improvements (e.g. 7.9% to 66.5% on SMCalFlow compositional split), nearly closing the performance gap between easier i.i.d.\ and harder compositional splits when used with a strong model, and reducing the need for a large number of demonstrations. We find that the resemblance of the target parse language to general-purpose code is a more important factor than the language's popularity in pre-training corpora. Our findings provide an improved methodology for building semantic parsers in the modern context of ICL with LLMs.
Paper Structure (66 sections, 30 figures, 11 tables, 1 algorithm)

This paper contains 66 sections, 30 figures, 11 tables, 1 algorithm.

Figures (30)

  • Figure 1: An example illustrating how moving the problem space from a DSL to a general-purpose programming language such as Python can improve output accuracy. When prompted with a DSL, the model doesn't use the operator most, resulting in an incorrect program. When prompted with Python, the model leverages its pre-existing knowledge of coding to produce the correct program and answer.
  • Figure 2: A partial example of a domain description containing the names of all objects and operators (in green) and type signatures (in orange).
  • Figure 3: Execution accuracy for varying number of demonstrations. In almost all cases, Python outperforms DSL, both with a domain description and without, across different numbers of demonstrations (prompt for SMCalFlow, DSL, Full DD could not fit more than 15 examples given the model's context length limitation).
  • Figure 4: Python-based prompts, both with and without DD, consistently outperform DSL-based prompts, even with better demonstrations, for every split of GeoQuery.
  • Figure 5: Execution accuracy for varying number of demonstrations, presenting the same data as Figure \ref{['fig:n-demonstrations-result']} but visualizes it against the number of prompt tokens. The effect of DDs greatly varies between the datasets. For both GeoQuery and SMCalFlow, having the Full DD is preferred whenever it can fit.
  • ...and 25 more figures