Table of Contents
Fetching ...

Can Large Language Models Autoformalize Kinematics?

Aditi Kabra, Jonathan Laurent, Sagar Bharadwaj, Ruben Martins, Stefan Mitsch, André Platzer

TL;DR

The paper investigates whether large language models can autoformalize physical problems into differential game logic for hybrid systems, addressing the bottleneck of manually crafting formal models for CPS. It introduces a 20-problem kinematics benchmark and a two-stage prompt-revision workflow, validated by a syntax-checker and a semantic checker that compares against the known symbolic solution. With few-shot prompting, the best model achieves a top-5 accuracy of about 70%, while remaining failures point to solver limitations on complex dynamics and the need for better abstractions. The work provides the first quantitative baseline for LLM-based autoformalization from natural language to hybrid dynamical models, highlighting actionable directions for improving checkers, solvers, and prompting strategies. Overall, it suggests a promising path toward scalable, machine-assisted formalization of physical models for autonomous CPS in both offline design and potential runtime deployment.

Abstract

Autonomous cyber-physical systems like robots and self-driving cars could greatly benefit from using formal methods to reason reliably about their control decisions. However, before a problem can be solved it needs to be stated. This requires writing a formal physics model of the cyber-physical system, which is a complex task that traditionally requires human expertise and becomes a bottleneck. This paper experimentally studies whether Large Language Models (LLMs) can automate the formalization process. A 20 problem benchmark suite is designed drawing from undergraduate level physics kinematics problems. In each problem, the LLM is provided with a natural language description of the objects' motion and must produce a model in differential game logic (dGL). The model is (1) syntax checked and iteratively refined based on parser feedback, and (2) semantically evaluated by checking whether symbolically executing the dGL formula recovers the solution to the original physics problem. A success rate of 70% (best over 5 samples) is achieved. We analyze failing cases, identifying directions for future improvement. This provides a first quantitative baseline for LLM-based autoformalization from natural language to a hybrid games logic with continuous dynamics.

Can Large Language Models Autoformalize Kinematics?

TL;DR

The paper investigates whether large language models can autoformalize physical problems into differential game logic for hybrid systems, addressing the bottleneck of manually crafting formal models for CPS. It introduces a 20-problem kinematics benchmark and a two-stage prompt-revision workflow, validated by a syntax-checker and a semantic checker that compares against the known symbolic solution. With few-shot prompting, the best model achieves a top-5 accuracy of about 70%, while remaining failures point to solver limitations on complex dynamics and the need for better abstractions. The work provides the first quantitative baseline for LLM-based autoformalization from natural language to hybrid dynamical models, highlighting actionable directions for improving checkers, solvers, and prompting strategies. Overall, it suggests a promising path toward scalable, machine-assisted formalization of physical models for autonomous CPS in both offline design and potential runtime deployment.

Abstract

Autonomous cyber-physical systems like robots and self-driving cars could greatly benefit from using formal methods to reason reliably about their control decisions. However, before a problem can be solved it needs to be stated. This requires writing a formal physics model of the cyber-physical system, which is a complex task that traditionally requires human expertise and becomes a bottleneck. This paper experimentally studies whether Large Language Models (LLMs) can automate the formalization process. A 20 problem benchmark suite is designed drawing from undergraduate level physics kinematics problems. In each problem, the LLM is provided with a natural language description of the objects' motion and must produce a model in differential game logic (dGL). The model is (1) syntax checked and iteratively refined based on parser feedback, and (2) semantically evaluated by checking whether symbolically executing the dGL formula recovers the solution to the original physics problem. A success rate of 70% (best over 5 samples) is achieved. We analyze failing cases, identifying directions for future improvement. This provides a first quantitative baseline for LLM-based autoformalization from natural language to a hybrid games logic with continuous dynamics.

Paper Structure

This paper contains 7 sections, 2 equations, 3 figures.

Figures (3)

  • Figure 1: Autoformalization pipeline
  • Figure 2: Outcomes of autoformalization by GPT 4o, GPT-4.1 and o3, with zero-shot and multi-shot prompting.
  • Figure 3: Problem outcome by model.