Can Large Language Models Create New Knowledge for Spatial Reasoning Tasks?
Thomas Greatrix, Roger Whitaker, Liam Turner, Walter Colombo
TL;DR
The paper probes whether Large Language Models can generate genuinely new knowledge for spatial reasoning tasks by testing three models on two challenging problems: a decidable placement game and a family of 24-sided polygons with right-angle constraints. It analyzes not only correctness but the novelty and usefulness of the models' insights, finding that Claude 3 can provide meaningful, nontrivial contributions (e.g., a dominant strategy for odd $n$ and a new property for the polygon family) despite some incorrect or incomplete answers. The results illustrate emergent reasoning capabilities and potential for AI-assisted hypothesis generation, while underscoring significant limitations, including partial correctness, evaluation difficulty, and possible training-data leakage concerns. Overall, the work suggests a promising role for LLMs as collaborators in research ideation, contingent on rigorous validation and broader experimentation.
Abstract
The potential for Large Language Models (LLMs) to generate new information offers a potential step change for research and innovation. This is challenging to assert as it can be difficult to determine what an LLM has previously seen during training, making "newness" difficult to substantiate. In this paper we observe that LLMs are able to perform sophisticated reasoning on problems with a spatial dimension, that they are unlikely to have previously directly encountered. While not perfect, this points to a significant level of understanding that state-of-the-art LLMs can now achieve, supporting the proposition that LLMs are able to yield significant emergent properties. In particular, Claude 3 is found to perform well in this regard.
