Table of Contents
Fetching ...

Grammar-Forced Translation of Natural Language to Temporal Logic using LLMs

William English, Dominic Simon, Sumit Kumar Jha, Rickard Ewetz

TL;DR

GraFT tackles NL-to-TL translation by imposing grammar-based output restrictions to reduce learning complexity. It introduces AP lifting via masked language modeling and grammar-constrained translation via a Seq2Seq model, backed by theoretical analysis showing lower cross-entropy and better gradient alignment. Empirically, GraFT yields average end-to-end accuracy gains of 5.49% and 14.06% in out-of-domain tests on CW, GLTL, and Navi benchmarks, demonstrating stronger generalization and data efficiency. The work advances NL-to-TL translation for robotics and autonomous systems by leveraging problem-specific grammar to guide learning.

Abstract

Translating natural language (NL) into a formal language such as temporal logic (TL) is integral for human communication with robots and autonomous systems. State-of-the-art approaches decompose the task into a lifting of atomic propositions (APs) phase and a translation phase. However, existing methods struggle with accurate lifting, the existence of co-references, and learning from limited data. In this paper, we propose a framework for NL to TL translation called Grammar Forced Translation (GraFT). The framework is based on the observation that previous work solves both the lifting and translation steps by letting a language model iteratively predict tokens from its full vocabulary. In contrast, GraFT reduces the complexity of both tasks by restricting the set of valid output tokens from the full vocabulary to only a handful in each step. The solution space reduction is obtained by exploiting the unique properties of each problem. We also provide a theoretical justification for why the solution space reduction leads to more efficient learning. We evaluate the effectiveness of GraFT using the CW, GLTL, and Navi benchmarks. Compared with state-of-the-art translation approaches, it can be observed that GraFT the end-to-end translation accuracy by 5.49% and out-of-domain translation accuracy by 14.06% on average.

Grammar-Forced Translation of Natural Language to Temporal Logic using LLMs

TL;DR

GraFT tackles NL-to-TL translation by imposing grammar-based output restrictions to reduce learning complexity. It introduces AP lifting via masked language modeling and grammar-constrained translation via a Seq2Seq model, backed by theoretical analysis showing lower cross-entropy and better gradient alignment. Empirically, GraFT yields average end-to-end accuracy gains of 5.49% and 14.06% in out-of-domain tests on CW, GLTL, and Navi benchmarks, demonstrating stronger generalization and data efficiency. The work advances NL-to-TL translation for robotics and autonomous systems by leveraging problem-specific grammar to guide learning.

Abstract

Translating natural language (NL) into a formal language such as temporal logic (TL) is integral for human communication with robots and autonomous systems. State-of-the-art approaches decompose the task into a lifting of atomic propositions (APs) phase and a translation phase. However, existing methods struggle with accurate lifting, the existence of co-references, and learning from limited data. In this paper, we propose a framework for NL to TL translation called Grammar Forced Translation (GraFT). The framework is based on the observation that previous work solves both the lifting and translation steps by letting a language model iteratively predict tokens from its full vocabulary. In contrast, GraFT reduces the complexity of both tasks by restricting the set of valid output tokens from the full vocabulary to only a handful in each step. The solution space reduction is obtained by exploiting the unique properties of each problem. We also provide a theoretical justification for why the solution space reduction leads to more efficient learning. We evaluate the effectiveness of GraFT using the CW, GLTL, and Navi benchmarks. Compared with state-of-the-art translation approaches, it can be observed that GraFT the end-to-end translation accuracy by 5.49% and out-of-domain translation accuracy by 14.06% on average.

Paper Structure

This paper contains 30 sections, 27 equations, 5 figures, 7 tables, 1 algorithm.

Figures (5)

  • Figure 1: An overview of the GraFT framework of end-to-end translation of NL to TL.
  • Figure 2: AP lifting using LLMs trained using different types of language modeling.
  • Figure 3: At each training step, prior to performing SoftMax on the output logits in preparation for computing $L$, we observe the label at position t in the target sequence L. Given this token and the current state of the TL parsing stack, we can obtain a set of valid token that may proceed that label. For each output logit Zt, we set the score for all tokens outside of Vt to -$\infty.$
  • Figure 4: A comparison of training loss for T5 with vs without grammar-forcing during training.
  • Figure 5: Accuracy results of three T5-based translation models. The T5 framework is a baseline evaluated with ground-truth lifted NL specifications as input. GraFT and NL2TL are evaluated using their respective lifting approaches. Each translation model is trained with LR= 2e-5 for 3 training epochs.