Table of Contents
Fetching ...

NL2TL: Transforming Natural Languages to Temporal Logics using Large Language Models

Yongchao Chen, Rujul Gandhi, Yang Zhang, Chuchu Fan

TL;DR

The paper tackles cross-domain NL-to-TL translation by introducing a lifted NL-STL framework trained with GPT-3–assisted data generation and finetuned T5 models. It constructs a 28K lifted NL-STL dataset and demonstrates that lifted pre-training plus AP recognition yields high accuracy (>95%) with minimal domain-specific data across five domains, outperforming end-to-end GPT-3 baselines. Ablation studies confirm the value of human annotations, data augmentation, and model capacity, while transfer-learning experiments show strong data efficiency. The work provides a practical, generalizable approach to NL-to-TL translation and releases a large lifted NL-STL dataset as a benchmark for future research; future work suggests leveraging GPT-4 to further improve quality and coverage.

Abstract

Temporal Logic (TL) can be used to rigorously specify complex high-level specification for systems in many engineering applications. The translation between natural language (NL) and TL has been under-explored due to the lack of dataset and generalizable model across different application domains. In this paper, we propose an accurate and generalizable transformation framework of English instructions from NL to TL, exploring the use of Large Language Models (LLMs) at multiple stages. Our contributions are twofold. First, we develop a framework to create a dataset of NL-TL pairs combining LLMs and human annotation. We publish a dataset with 28K NL-TL pairs. Then, we finetune T5 models on the lifted versions (i.e., the specific Atomic Propositions (AP) are hidden) of the NL and TL. The enhanced generalizability originates from two aspects: 1) Usage of lifted NL-TL characterizes common logical structures, without constraints of specific domains. 2) Application of LLMs in dataset creation largely enhances corpus richness. We test the generalization of trained models on five varied domains. To achieve full NL-TL transformation, we either combine the lifted model with AP recognition task or do the further finetuning on each specific domain. During the further finetuning, our model achieves higher accuracy (>95%) using only <10% training data, compared with the baseline sequence to sequence (Seq2Seq) model.

NL2TL: Transforming Natural Languages to Temporal Logics using Large Language Models

TL;DR

The paper tackles cross-domain NL-to-TL translation by introducing a lifted NL-STL framework trained with GPT-3–assisted data generation and finetuned T5 models. It constructs a 28K lifted NL-STL dataset and demonstrates that lifted pre-training plus AP recognition yields high accuracy (>95%) with minimal domain-specific data across five domains, outperforming end-to-end GPT-3 baselines. Ablation studies confirm the value of human annotations, data augmentation, and model capacity, while transfer-learning experiments show strong data efficiency. The work provides a practical, generalizable approach to NL-to-TL translation and releases a large lifted NL-STL dataset as a benchmark for future research; future work suggests leveraging GPT-4 to further improve quality and coverage.

Abstract

Temporal Logic (TL) can be used to rigorously specify complex high-level specification for systems in many engineering applications. The translation between natural language (NL) and TL has been under-explored due to the lack of dataset and generalizable model across different application domains. In this paper, we propose an accurate and generalizable transformation framework of English instructions from NL to TL, exploring the use of Large Language Models (LLMs) at multiple stages. Our contributions are twofold. First, we develop a framework to create a dataset of NL-TL pairs combining LLMs and human annotation. We publish a dataset with 28K NL-TL pairs. Then, we finetune T5 models on the lifted versions (i.e., the specific Atomic Propositions (AP) are hidden) of the NL and TL. The enhanced generalizability originates from two aspects: 1) Usage of lifted NL-TL characterizes common logical structures, without constraints of specific domains. 2) Application of LLMs in dataset creation largely enhances corpus richness. We test the generalization of trained models on five varied domains. To achieve full NL-TL transformation, we either combine the lifted model with AP recognition task or do the further finetuning on each specific domain. During the further finetuning, our model achieves higher accuracy (>95%) using only <10% training data, compared with the baseline sequence to sequence (Seq2Seq) model.
Paper Structure (36 sections, 1 equation, 15 figures, 10 tables, 2 algorithms)

This paper contains 36 sections, 1 equation, 15 figures, 10 tables, 2 algorithms.

Figures (15)

  • Figure 1: Illustration of lifted NL and lifted STL.
  • Figure 2: Illustration of different formats of STL expressions. (a) Different expression formats of the same STL. (b) The binary tree representation of STL.
  • Figure 3: Framework1 to generate NL-STL pairs.
  • Figure 4: Framework2 to generate NL-STL pairs. One extra loop between NL and STL is added.
  • Figure 5: Testing accuracy vs. Number of created NL-STL pairs. The data collected from Navigation and Circuit work are all used during training. The GPT-3-assisted data refers to the data generated with the help of GPT-3, and the Manual data refers to the instructions collected from volunteers. The figure shows the necessity of the created data.
  • ...and 10 more figures