Table of Contents
Fetching ...

Natural Language Decomposition and Interpretation of Complex Utterances

Harsh Jhamtani, Hao Fang, Patrick Xia, Eran Levy, Jacob Andreas, Ben Van Durme

TL;DR

This work tackles the challenge of converting complex natural language utterances into executable programs without requiring extensive complex labeled data. It introduces DecInt, a framework that decomposes a user request into a sequence of simpler NL steps and interprets each step with a language-to-program interpreter, guided by in-context learning and BM25-backed exemplar retrieval. The DeCU benchmark provides a focused dataset of calendar and email tasks with elementary and complex utterances to evaluate NL decomposition and program generation. Empirical results show DecInt outperforms standard few-shot prompting on both automated and human-judged correctness metrics, though a substantial portion of outputs remain non well-formed, indicating avenues for constraint-based improvements. The approach demonstrates practical potential for building NL interfaces that handle long-horizon or multi-step goals with limited complex supervision, while also highlighting ethical considerations in deploying LLM-based NL-to-program systems.

Abstract

Designing natural language interfaces has historically required collecting supervised data to translate user requests into carefully designed intent representations. This requires enumerating and labeling a long tail of user requests, which is challenging. At the same time, large language models (LLMs) encode knowledge about goals and plans that can help conversational assistants interpret user requests requiring numerous steps to complete. We introduce an approach to handle complex-intent-bearing utterances from a user via a process of hierarchical natural language decomposition and interpretation. Our approach uses a pre-trained language model to decompose a complex utterance into a sequence of simpler natural language steps and interprets each step using the language-to-program model designed for the interface. To test our approach, we collect and release DeCU -- a new NL-to-program benchmark to evaluate Decomposition of Complex Utterances. Experiments show that the proposed approach enables the interpretation of complex utterances with almost no complex training data, while outperforming standard few-shot prompting approaches.

Natural Language Decomposition and Interpretation of Complex Utterances

TL;DR

This work tackles the challenge of converting complex natural language utterances into executable programs without requiring extensive complex labeled data. It introduces DecInt, a framework that decomposes a user request into a sequence of simpler NL steps and interprets each step with a language-to-program interpreter, guided by in-context learning and BM25-backed exemplar retrieval. The DeCU benchmark provides a focused dataset of calendar and email tasks with elementary and complex utterances to evaluate NL decomposition and program generation. Empirical results show DecInt outperforms standard few-shot prompting on both automated and human-judged correctness metrics, though a substantial portion of outputs remain non well-formed, indicating avenues for constraint-based improvements. The approach demonstrates practical potential for building NL interfaces that handle long-horizon or multi-step goals with limited complex supervision, while also highlighting ethical considerations in deploying LLM-based NL-to-program systems.

Abstract

Designing natural language interfaces has historically required collecting supervised data to translate user requests into carefully designed intent representations. This requires enumerating and labeling a long tail of user requests, which is challenging. At the same time, large language models (LLMs) encode knowledge about goals and plans that can help conversational assistants interpret user requests requiring numerous steps to complete. We introduce an approach to handle complex-intent-bearing utterances from a user via a process of hierarchical natural language decomposition and interpretation. Our approach uses a pre-trained language model to decompose a complex utterance into a sequence of simpler natural language steps and interprets each step using the language-to-program model designed for the interface. To test our approach, we collect and release DeCU -- a new NL-to-program benchmark to evaluate Decomposition of Complex Utterances. Experiments show that the proposed approach enables the interpretation of complex utterances with almost no complex training data, while outperforming standard few-shot prompting approaches.
Paper Structure (33 sections, 12 figures, 5 tables)

This paper contains 33 sections, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Parsing NL user utterances into programs. We study a scenario in which a large number of elementary utterances have been annotated with programs (top block), and we wish to build a model that can generalize to complex utterances (bottom blocks) requiring more elaborate programs. We introduce a method called DecInt that uses an LLM to decompose a complex utterance by predicting simpler NL steps, each of which is parsed to a program according to the annotated elementary utterances.
  • Figure 2: Examples of complex utterances in DeCU. Each utterance is accompanied by decompositions consisting of a sequence of NL steps and associated program fragments, annotated by domain experts.
  • Figure 3: DecInt maps complex utterances into elementary steps, each of which is parsed in sequence to arrive at a final program. NL decomposition and program generation steps are interleaved. While parsing a step, up to $M$ similar examples of elementary utterances are retrieved.
  • Figure 4: Outputs from DecInt and baselines on sample complex utterances in DeCU.
  • Figure A1: The interface used by domain experts to annotate the decompositions. The interface points out the syntax, type, and missing function errors, enabling the experts to author well-formed annotations. These "express" annotations are subsequently normalized by stripping comments, removing type annotations, code formatting, and selecting canonical function names when there is ambiguity (concat vs. +). The resulting programs conform to Scala syntax.
  • ...and 7 more figures