Table of Contents
Fetching ...

GinSign: Grounding Natural Language Into System Signatures for Temporal Logic Translation

William English, Chase Walker, Dominic Simon, Rickard Ewetz

TL;DR

GinSign tackles the critical gap in NL-to-TL translation by grounding all atomic propositions to a formal system signature, enabling executable TL and downstream model checking. It introduces a hierarchical grounding framework that first selects a predicate and then grounds typed arguments using a prefix-based BERT classifier, dramatically reducing label space and avoiding reliance on large LLMs. Across VLTL-Bench domains, GinSign achieves perfect predicate grounding, strong argument grounding, and high end-to-end grounded translation accuracy, outperforming prompting baselines and enabling semantic verification. The approach demonstrates robust generalization to unseen domains and terminology, highlighting the practical potential for grounded NL-to-TL in trustworthy autonomous systems.

Abstract

Natural language (NL) to temporal logic (TL) translation enables engineers to specify, verify, and enforce system behaviors without manually crafting formal specifications-an essential capability for building trustworthy autonomous systems. While existing NL-to-TL translation frameworks have demonstrated encouraging initial results, these systems either explicitly assume access to accurate atom grounding or suffer from low grounded translation accuracy. In this paper, we propose a framework for Grounding Natural Language Into System Signatures for Temporal Logic translation called GinSign. The framework introduces a grounding model that learns the abstract task of mapping NL spans onto a given system signature: given a lifted NL specification and a system signature $\mathcal{S}$, the classifier must assign each lifted atomic proposition to an element of the set of signature-defined atoms $\mathcal{P}$. We decompose the grounding task hierarchically- first predicting predicate labels, then selecting the appropriately typed constant arguments. Decomposing this task from a free-form generation problem into a structured classification problem permits the use of smaller masked language models and eliminates the reliance on expensive LLMs. Experiments across multiple domains show that frameworks which omit grounding tend to produce syntactically correct lifted LTL that is semantically nonequivalent to grounded target expressions, whereas our framework supports downstream model checking and achieves grounded logical-equivalence scores of $95.5\%$, a $1.4\times$ improvement over SOTA.

GinSign: Grounding Natural Language Into System Signatures for Temporal Logic Translation

TL;DR

GinSign tackles the critical gap in NL-to-TL translation by grounding all atomic propositions to a formal system signature, enabling executable TL and downstream model checking. It introduces a hierarchical grounding framework that first selects a predicate and then grounds typed arguments using a prefix-based BERT classifier, dramatically reducing label space and avoiding reliance on large LLMs. Across VLTL-Bench domains, GinSign achieves perfect predicate grounding, strong argument grounding, and high end-to-end grounded translation accuracy, outperforming prompting baselines and enabling semantic verification. The approach demonstrates robust generalization to unseen domains and terminology, highlighting the practical potential for grounded NL-to-TL in trustworthy autonomous systems.

Abstract

Natural language (NL) to temporal logic (TL) translation enables engineers to specify, verify, and enforce system behaviors without manually crafting formal specifications-an essential capability for building trustworthy autonomous systems. While existing NL-to-TL translation frameworks have demonstrated encouraging initial results, these systems either explicitly assume access to accurate atom grounding or suffer from low grounded translation accuracy. In this paper, we propose a framework for Grounding Natural Language Into System Signatures for Temporal Logic translation called GinSign. The framework introduces a grounding model that learns the abstract task of mapping NL spans onto a given system signature: given a lifted NL specification and a system signature , the classifier must assign each lifted atomic proposition to an element of the set of signature-defined atoms . We decompose the grounding task hierarchically- first predicting predicate labels, then selecting the appropriately typed constant arguments. Decomposing this task from a free-form generation problem into a structured classification problem permits the use of smaller masked language models and eliminates the reliance on expensive LLMs. Experiments across multiple domains show that frameworks which omit grounding tend to produce syntactically correct lifted LTL that is semantically nonequivalent to grounded target expressions, whereas our framework supports downstream model checking and achieves grounded logical-equivalence scores of , a improvement over SOTA.

Paper Structure

This paper contains 32 sections, 11 equations, 2 figures, 6 tables, 1 algorithm.

Figures (2)

  • Figure 1: Overview of the GinSign Framework.
  • Figure 2: An overview of our grounding components. Given $n$ lifted NP APs, we convert the system signature into a prefix using Algorithm \ref{['alg:prefix']}. The lifted NL is first combined with the predicates prefix to ground the predicate to a known action (a). Since each predicate requires an argument, we filter out non-candidate arguments by type (b). We then combine the lifted NL with the arguments prefix to classify the correct argument (c). Both predicate and argument grounding use the same token classification BERT model, which processes any prefix and lifted NL.