Plane Geometry Problem Solving with Multi-modal Reasoning: A Survey
Seunghyuk Cho, Zhenyue Qin, Yang Liu, Youngbin Choi, Seungbeom Lee, Dongwoo Kim
TL;DR
This survey consolidates recent advances in plane geometry problem solving (PGPS) by organizing approaches around an encoder–decoder framework and detailing how inputs (diagrams and text) are transformed into intermediate representations and final outputs. It distinguishes encoder outputs into formal-language descriptions and embedding vectors, and decoder outputs into theorems, logic programs, or natural language, highlighting rule-based and neural strategies across pipelines. The paper also discusses critical challenges such as diagram-perception hallucinations and data leakage in benchmarks, and outlines future directions for more robust perception, better benchmark design, and standardized evaluation. Collectively, the work clarifies existing methodological patterns and data-collection practices, guiding future PGPS research toward scalable, reliable multi-modal geometric reasoning with practical impact on tutoring and automated proof systems.
Abstract
Plane geometry problem solving (PGPS) has recently gained significant attention as a benchmark to assess the multi-modal reasoning capabilities of large vision-language models. Despite the growing interest in PGPS, the research community still lacks a comprehensive overview that systematically synthesizes recent work in PGPS. To fill this gap, we present a survey of existing PGPS studies. We first categorize PGPS methods into an encoder-decoder framework and summarize the corresponding output formats used by their encoders and decoders. Subsequently, we classify and analyze these encoders and decoders according to their architectural designs. Finally, we outline major challenges and promising directions for future research. In particular, we discuss the hallucination issues arising during the encoding phase within encoder-decoder architectures, as well as the problem of data leakage in current PGPS benchmarks.
