Natural Language Decomposition and Interpretation of Complex Utterances
Harsh Jhamtani, Hao Fang, Patrick Xia, Eran Levy, Jacob Andreas, Ben Van Durme
TL;DR
This work tackles the challenge of converting complex natural language utterances into executable programs without requiring extensive complex labeled data. It introduces DecInt, a framework that decomposes a user request into a sequence of simpler NL steps and interprets each step with a language-to-program interpreter, guided by in-context learning and BM25-backed exemplar retrieval. The DeCU benchmark provides a focused dataset of calendar and email tasks with elementary and complex utterances to evaluate NL decomposition and program generation. Empirical results show DecInt outperforms standard few-shot prompting on both automated and human-judged correctness metrics, though a substantial portion of outputs remain non well-formed, indicating avenues for constraint-based improvements. The approach demonstrates practical potential for building NL interfaces that handle long-horizon or multi-step goals with limited complex supervision, while also highlighting ethical considerations in deploying LLM-based NL-to-program systems.
Abstract
Designing natural language interfaces has historically required collecting supervised data to translate user requests into carefully designed intent representations. This requires enumerating and labeling a long tail of user requests, which is challenging. At the same time, large language models (LLMs) encode knowledge about goals and plans that can help conversational assistants interpret user requests requiring numerous steps to complete. We introduce an approach to handle complex-intent-bearing utterances from a user via a process of hierarchical natural language decomposition and interpretation. Our approach uses a pre-trained language model to decompose a complex utterance into a sequence of simpler natural language steps and interprets each step using the language-to-program model designed for the interface. To test our approach, we collect and release DeCU -- a new NL-to-program benchmark to evaluate Decomposition of Complex Utterances. Experiments show that the proposed approach enables the interpretation of complex utterances with almost no complex training data, while outperforming standard few-shot prompting approaches.
