Table of Contents
Fetching ...

Formal Specifications from Natural Language

Christopher Hahn, Frederik Schmitt, Julia J. Tillman, Niklas Metzger, Julian Siber, Bernd Finkbeiner

TL;DR

This work investigates whether pre-trained transformer language models can generalize from natural language to formal specifications across regex, FOL, and LTL. By fine-tuning T5 on six NL-to-formal datasets, the authors demonstrate strong generalization to unseen nouns, variable names, and operator descriptions, and achieve a new state-of-the-art for regex translation without domain-specific reasoning. They also contribute two new FOL datasets (FOL-mnli, FOL-codesc) and two LTL datasets (LTL-pattern, LTL-synthesis), enabling cross-domain evaluation and analysis of generalization. The results suggest that open-source LMs offer a practical, data-efficient path to making formal specification tasks more accessible and trustworthy for broader users.

Abstract

We study the generalization abilities of language models when translating natural language into formal specifications with complex semantics. In particular, we fine-tune language models on three datasets consisting of English sentences and their corresponding formal representation: 1) regular expressions (regex), frequently used in programming and search; 2) First-order logic (FOL), commonly used in software verification and theorem proving; and 3) linear-time temporal logic (LTL), which forms the basis for industrial hardware specification languages. Our experiments show that, in these diverse domains, the language models maintain their generalization capabilities from pre-trained knowledge of natural language to generalize, e.g., to new variable names or operator descriptions. Additionally, they achieve competitive performance, and even outperform the state-of-the-art for translating into regular expressions, with the benefits of being easy to access, efficient to fine-tune, and without a particular need for domain-specific reasoning.

Formal Specifications from Natural Language

TL;DR

This work investigates whether pre-trained transformer language models can generalize from natural language to formal specifications across regex, FOL, and LTL. By fine-tuning T5 on six NL-to-formal datasets, the authors demonstrate strong generalization to unseen nouns, variable names, and operator descriptions, and achieve a new state-of-the-art for regex translation without domain-specific reasoning. They also contribute two new FOL datasets (FOL-mnli, FOL-codesc) and two LTL datasets (LTL-pattern, LTL-synthesis), enabling cross-domain evaluation and analysis of generalization. The results suggest that open-source LMs offer a practical, data-efficient path to making formal specification tasks more accessible and trustworthy for broader users.

Abstract

We study the generalization abilities of language models when translating natural language into formal specifications with complex semantics. In particular, we fine-tune language models on three datasets consisting of English sentences and their corresponding formal representation: 1) regular expressions (regex), frequently used in programming and search; 2) First-order logic (FOL), commonly used in software verification and theorem proving; and 3) linear-time temporal logic (LTL), which forms the basis for industrial hardware specification languages. Our experiments show that, in these diverse domains, the language models maintain their generalization capabilities from pre-trained knowledge of natural language to generalize, e.g., to new variable names or operator descriptions. Additionally, they achieve competitive performance, and even outperform the state-of-the-art for translating into regular expressions, with the benefits of being easy to access, efficient to fine-tune, and without a particular need for domain-specific reasoning.
Paper Structure (15 sections, 6 equations, 5 figures, 2 tables)

This paper contains 15 sections, 6 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: An ID example of a regex model trained solely on the noun "dog", tested OOD on new nouns "eye" and "time"; and an ID example of an LTL model trained on variables $i_0$ to $i_4$ and $o_0$ to $o_4$, tested OOD on new variables and operator descriptions (bottom). OOD fragments are highlighted.
  • Figure 2: Syntactic accuracy of pre-trained T5 regex models (left) and baseline T5 regex models (right) trained on variations of Regex-synthetic with proper subsets of nouns.
  • Figure 3: Respective accuracy per sequence on validation sets during training of the best performing models reported in Table \ref{['table:results']}: Regex (top left), FOL (top right), LTL (bottom left); and the accuracy per sequence for the new nouns experiment (bottom right).
  • Figure 4: Sensitivity to learning rate schedule of baseline model on FOL-codesc dataset.
  • Figure 5: Regex syntax used in the considered datasets; taken from DBLP:conf/emnlp/LocascioNDKB16.