Can Large Language Models Create New Knowledge for Spatial Reasoning Tasks?

Thomas Greatrix; Roger Whitaker; Liam Turner; Walter Colombo

Can Large Language Models Create New Knowledge for Spatial Reasoning Tasks?

Thomas Greatrix, Roger Whitaker, Liam Turner, Walter Colombo

TL;DR

The paper probes whether Large Language Models can generate genuinely new knowledge for spatial reasoning tasks by testing three models on two challenging problems: a decidable placement game and a family of 24-sided polygons with right-angle constraints. It analyzes not only correctness but the novelty and usefulness of the models' insights, finding that Claude 3 can provide meaningful, nontrivial contributions (e.g., a dominant strategy for odd $n$ and a new property for the polygon family) despite some incorrect or incomplete answers. The results illustrate emergent reasoning capabilities and potential for AI-assisted hypothesis generation, while underscoring significant limitations, including partial correctness, evaluation difficulty, and possible training-data leakage concerns. Overall, the work suggests a promising role for LLMs as collaborators in research ideation, contingent on rigorous validation and broader experimentation.

Abstract

The potential for Large Language Models (LLMs) to generate new information offers a potential step change for research and innovation. This is challenging to assert as it can be difficult to determine what an LLM has previously seen during training, making "newness" difficult to substantiate. In this paper we observe that LLMs are able to perform sophisticated reasoning on problems with a spatial dimension, that they are unlikely to have previously directly encountered. While not perfect, this points to a significant level of understanding that state-of-the-art LLMs can now achieve, supporting the proposition that LLMs are able to yield significant emergent properties. In particular, Claude 3 is found to perform well in this regard.

Can Large Language Models Create New Knowledge for Spatial Reasoning Tasks?

TL;DR

and a new property for the polygon family) despite some incorrect or incomplete answers. The results illustrate emergent reasoning capabilities and potential for AI-assisted hypothesis generation, while underscoring significant limitations, including partial correctness, evaluation difficulty, and possible training-data leakage concerns. Overall, the work suggests a promising role for LLMs as collaborators in research ideation, contingent on rigorous validation and broader experimentation.

Abstract

Paper Structure (18 sections, 15 figures)

This paper contains 18 sections, 15 figures.

Introduction
Background
Approach
Finding the winner of a decidable game
LLM Responses
Polygons with special properties
Polygons with special properties results.
Limitations
Conclusion
Appendix
An analysis of the responses finding the winner of a decidable game
Claude 3's response to the decidable game problem.
Bing Copilot's response to the decidable game problem.
ChatGPT-3.5-Turbo's response to the decidable game problem.
An analysis of the responses to the polygons question
...and 3 more sections

Figures (15)

Figure 1: All 7 polygons satisfying the prompt.
Figure 2: Claude 3's attempt to find the winner.
Figure 3: The game when n = 2
Figure 4: Bing Copilots's attempt to find the winner
Figure 5: The game when n = 3
...and 10 more figures

Can Large Language Models Create New Knowledge for Spatial Reasoning Tasks?

TL;DR

Abstract

Can Large Language Models Create New Knowledge for Spatial Reasoning Tasks?

Authors

TL;DR

Abstract

Table of Contents

Figures (15)