Can Large Language Models Reason about the Region Connection Calculus?
Anthony G Cohn, Robert E Blackwell
TL;DR
This study evaluates whether large language models can perform RCC-8 qualitative spatial reasoning, using three experiment pairs that target relational composition tables, human-preference alignment, and conceptual neighbourhood reasoning. Across a diverse set of models and both eponymous and anonymised relation names, the results show that LLMs struggle to reliably reconstruct $CT$ and to fully align with human cognitive preferences, though some models approach human-like cues on specific tasks and the CN task is relatively easier. The work highlights substantial stochasticity and biases in LLM reasoning for spatial tasks, and it emphasizes the need for dedicated QSR benchmarks, prompts, or multimodal approaches to achieve robust symbolic spatial reasoning. Overall, the findings suggest LLMs are not yet reliable symbolic reasoners for RCC-8, but offer a framework and dataset for rigorous future evaluation and a clear direction for follow-up studies in more tractable calculi and prompting strategies.
Abstract
Qualitative Spatial Reasoning is a well explored area of Knowledge Representation and Reasoning and has multiple applications ranging from Geographical Information Systems to Robotics and Computer Vision. Recently, many claims have been made for the reasoning capabilities of Large Language Models (LLMs). Here, we investigate the extent to which a set of representative LLMs can perform classical qualitative spatial reasoning tasks on the mereotopological Region Connection Calculus, RCC-8. We conduct three pairs of experiments (reconstruction of composition tables, alignment to human composition preferences, conceptual neighbourhood reconstruction) using state-of-the-art LLMs; in each pair one experiment uses eponymous relations and one, anonymous relations (to test the extent to which the LLM relies on knowledge about the relation names obtained during training). All instances are repeated 30 times to measure the stochasticity of the LLMs.
