Beyond Lines and Circles: Unveiling the Geometric Reasoning Gap in Large Language Models
Spyridon Mouselinos, Henryk Michalewski, Mateusz Malinowski
TL;DR
This work investigates the geometric reasoning gap in Large Language Models by analyzing constructive geometry tasks and identifying biases, spatial reasoning deficiencies, and naming biases. It introduces a simulacra-based multi-agent framework with adaptive prompting, a Visual Relations Prompt, and target-name renaming to enhance spatial reasoning and tool usage. Through extensive experiments on Euclidea and cross-dataset tests, the approach yields substantial gains over single-agent baselines, with notable improvements for open models and strong generalization to other math datasets. The study suggests a promising direction for designing multi-agent, role-specialized LLM systems that integrate planning, symbol manipulation, and spatial understanding, while acknowledging cost and generalization limitations.
Abstract
Large Language Models (LLMs) demonstrate ever-increasing abilities in mathematical and algorithmic tasks, yet their geometric reasoning skills are underexplored. We investigate LLMs' abilities in constructive geometric problem-solving one of the most fundamental steps in the development of human mathematical reasoning. Our work reveals notable challenges that the state-of-the-art LLMs face in this domain despite many successes in similar areas. LLMs exhibit biases in target variable selection and struggle with 2D spatial relationships, often misrepresenting and hallucinating objects and their placements. To this end, we introduce a framework that formulates an LLMs-based multi-agents system that enhances their existing reasoning potential by conducting an internal dialogue. This work underscores LLMs' current limitations in geometric reasoning and improves geometric reasoning capabilities through self-correction, collaboration, and diverse role specializations.
