Table of Contents
Fetching ...

Improving the Diproche CNL through Autoformalization via Large Language Models

Merlin Carl

TL;DR

The paper investigates autoformalization of Diproche's controlled natural language using prompting of large language models (e.g., DaVinci-3, GPT-4-Turbo) to translate proofs written in natural language into Diproche's internal formal representation. It motivates this approach by the limitations of a purely grammar-based pipeline and demonstrates a prototype pipeline that integrates an LLM preprocessor with Diproche's Prolog core. Empirical results show high readiness of the approach, including 33/33 correct processing on typical Diproche texts and 98% success on 50 English prompts, indicating LLMs can effectively bootstrap and extend the Diproche CNL with greater flexibility and multilingual adaptability. The authors discuss practical constraints (latency, costs, model drift) and outline a path toward local, domain-specialized models and additional didactical tools such as hints and feedback, highlighting significant potential for improving proof education tools.

Abstract

The Diproche system is an automated proof checker for texts written in a controlled fragment of German, designed for didactical applications in classes introducing students to proofs for the first time. The first version of the system used a controlled natural language for which a Prolog formalization routine was written. In this paper, we explore the possibility of prompting large language models for autoformalization in the context of Diproche, with encouraging first results.

Improving the Diproche CNL through Autoformalization via Large Language Models

TL;DR

The paper investigates autoformalization of Diproche's controlled natural language using prompting of large language models (e.g., DaVinci-3, GPT-4-Turbo) to translate proofs written in natural language into Diproche's internal formal representation. It motivates this approach by the limitations of a purely grammar-based pipeline and demonstrates a prototype pipeline that integrates an LLM preprocessor with Diproche's Prolog core. Empirical results show high readiness of the approach, including 33/33 correct processing on typical Diproche texts and 98% success on 50 English prompts, indicating LLMs can effectively bootstrap and extend the Diproche CNL with greater flexibility and multilingual adaptability. The authors discuss practical constraints (latency, costs, model drift) and outline a path toward local, domain-specialized models and additional didactical tools such as hints and feedback, highlighting significant potential for improving proof education tools.

Abstract

The Diproche system is an automated proof checker for texts written in a controlled fragment of German, designed for didactical applications in classes introducing students to proofs for the first time. The first version of the system used a controlled natural language for which a Prolog formalization routine was written. In this paper, we explore the possibility of prompting large language models for autoformalization in the context of Diproche, with encouraging first results.
Paper Structure (9 sections, 1 figure)