Guiding LLM Temporal Logic Generation with Explicit Separation of Data and Control
William Murphy, Nikolaus Holzer, Nathan Koenig, Leyi Cui, Raven Rothkopf, Feitong Qiao, Mark Santolucito
TL;DR
The paper addresses the difficulty of generating rigorous temporal logic specifications for reactive systems by leveraging LLMs to translate natural language into Temporal Stream Logic (TSL). It proposes a prompt-generation pipeline that provides three inputs to the LLM: a high-level natural language summary, a detailed NL description of assumptions and guarantees, and an explicit separation of data and control through function and predicate interfaces. The core contributions include a benchmark suite for evaluating LLM-based TSL specification generation, ablation studies showing that separating data from control improves specification accuracy (with caveats when term definitions are misunderstood), and a practical workflow that grounds LLM outputs in a formal, verifiable TSL framework. The work also formalizes the TSL realizability setting with universal quantification over data-term interpretations, illustrating the potential for scalable, safer LL M-assisted temporal-specification workflows and providing a benchmark for future research.
Abstract
Temporal logics are powerful tools that are widely used for the synthesis and verification of reactive systems. The recent progress on Large Language Models (LLMs) has the potential to make the process of writing such specifications more accessible. However, writing specifications in temporal logics remains challenging for all but the most expert users. A key question in using LLMs for temporal logic specification engineering is to understand what kind of guidance is most helpful to the LLM and the users to easily produce specifications. Looking specifically at the problem of reactive program synthesis, we explore the impact of providing an LLM with guidance on the separation of control and data--making explicit for the LLM what functionality is relevant for the specification, and treating the remaining functionality as an implementation detail for a series of pre-defined functions and predicates. We present a benchmark set and find that this separation of concerns improves specification generation. Our benchmark provides a test set against which to verify future work in LLM generation of temporal logic specifications.
