Exploring the Limits of Fine-grained LLM-based Physics Inference via Premise Removal Interventions

Jordan Meadows; Tamsin James; Andre Freitas

Exploring the Limits of Fine-grained LLM-based Physics Inference via Premise Removal Interventions

Jordan Meadows, Tamsin James, Andre Freitas

TL;DR

It is found that the models' mathematical reasoning is not physics-informed in this setting, where physical context is predominantly ignored in favour of reverse-engineering solutions.

Abstract

Language models (LMs) can hallucinate when performing complex mathematical reasoning. Physics provides a rich domain for assessing their mathematical capabilities, where physical context requires that any symbolic manipulation satisfies complex semantics (\textit{e.g.,} units, tensorial order). In this work, we systematically remove crucial context from prompts to force instances where model inference may be algebraically coherent, yet unphysical. We assess LM capabilities in this domain using a curated dataset encompassing multiple notations and Physics subdomains. Further, we improve zero-shot scores using synthetic in-context examples, and demonstrate non-linear degradation of derivation quality with perturbation strength via the progressive omission of supporting premises. We find that the models' mathematical reasoning is not physics-informed in this setting, where physical context is predominantly ignored in favour of reverse-engineering solutions.

Exploring the Limits of Fine-grained LLM-based Physics Inference via Premise Removal Interventions

TL;DR

It is found that the models' mathematical reasoning is not physics-informed in this setting, where physical context is predominantly ignored in favour of reverse-engineering solutions.

Abstract

Paper Structure (39 sections, 79 equations, 5 figures, 6 tables, 1 algorithm)

This paper contains 39 sections, 79 equations, 5 figures, 6 tables, 1 algorithm.

Introduction
Related work
Physics Dataset Construction
Data Analysis.
Derivation Generation and Generalisation Capabilities
The Derivation Generation task
Prompting LMs.
Controlled Premise Removal
Evaluation
Derivation Generation
Use of Mathematics which violates Physics.
Premise Removal
The non-linear degradation in derivation quality reported by text generation metrics is supported by manual evaluation.
Substitution errors.
Language models derive equations by reverse-engineering.
...and 24 more sections

Figures (5)

Figure 1: An incorrect derivation generated by few-shot GPT-4 that scores high ROUGE (81), BLEU (71), and GLEU (71). Erroneous equations are denoted in red.
Figure 2: The difference between the Wikipedia proof (left) and our equational interpretation (right) of a reasoning chain related to the Uncertainty Principle in quantum mechanics. The (red) values represent the number of intermediate equations between equivalent equations in each representation, and highlights the detail gap.
Figure 3: $P(L)$ is the probability that a given derivation contains $L$ equations.
Figure 4: $P(N)$ is the probability that a given derivation contains $N$ premise equations.
Figure 5: Excerpts from few-shot GPT-4 derivations that violate well-documented Physics.

Exploring the Limits of Fine-grained LLM-based Physics Inference via Premise Removal Interventions

TL;DR

Abstract

Exploring the Limits of Fine-grained LLM-based Physics Inference via Premise Removal Interventions

Authors

TL;DR

Abstract

Table of Contents

Figures (5)