Table of Contents
Fetching ...

Using Large Language Models for (De-)Formalization and Natural Argumentation Exercises for Beginner's Students

Merlin Carl

TL;DR

The paper surveys two LLM-based educational systems: autoformalization, translating natural language to formal logic, and deformalization, providing automated correction for backward translation tasks, plus an extension to natural-language argumentation via the Diproche framework. It demonstrates strong performance of GPT-4-Turbo on propositional and first-order formalizations, while highlighting limitations such as content hallucination, higher-complexity cases, and reliance on cloud-based models. The work introduces a correction pipeline with a PyProver-based verification loop and a simplicity grader, and discusses practical considerations, including stability, pricing, and the potential for local models. Collectively, the results suggest promising didactic tools for logic education and natural-language argumentation, with clear avenues for further refinement and domain expansion.

Abstract

We describe two systems currently being developed that use large language models for the automatized correction of (i) exercises in translating back and forth between natural language and the languages of propositional logic and first-order predicate logic and (ii) exercises in writing simple arguments in natural language in non-mathematical scenarios.

Using Large Language Models for (De-)Formalization and Natural Argumentation Exercises for Beginner's Students

TL;DR

The paper surveys two LLM-based educational systems: autoformalization, translating natural language to formal logic, and deformalization, providing automated correction for backward translation tasks, plus an extension to natural-language argumentation via the Diproche framework. It demonstrates strong performance of GPT-4-Turbo on propositional and first-order formalizations, while highlighting limitations such as content hallucination, higher-complexity cases, and reliance on cloud-based models. The work introduces a correction pipeline with a PyProver-based verification loop and a simplicity grader, and discusses practical considerations, including stability, pricing, and the potential for local models. Collectively, the results suggest promising didactic tools for logic education and natural-language argumentation, with clear avenues for further refinement and domain expansion.

Abstract

We describe two systems currently being developed that use large language models for the automatized correction of (i) exercises in translating back and forth between natural language and the languages of propositional logic and first-order predicate logic and (ii) exercises in writing simple arguments in natural language in non-mathematical scenarios.
Paper Structure (9 sections, 1 equation, 1 figure)

This paper contains 9 sections, 1 equation, 1 figure.

Figures (1)

  • Figure 1: Flowchart for the correction routine for deformalization exercises. (Generated with the help of GPT-3.5.)