LANS: A Layout-Aware Neural Solver for Plane Geometry Problem
Zhong-Zhi Li, Ming-Liang Zhang, Fei Yin, Cheng-Lin Liu
TL;DR
This paper tackles geometry problem solving by emphasizing the need to preserve and exploit layout information in geometry diagrams. It proposes LANS, a layout-aware neural solver built around MLA-PLM for multimodal pre-training with SSP and PMP, and LA-FA for point-guided cross-modal fusion. The model is validated on Geometry3K and PGPS9K, outperforming state-of-the-art symbolic and neural solvers and several multimodal LLM baselines. The results highlight the value of explicit layout modeling for accurate reasoning in plane geometry tasks.
Abstract
Geometry problem solving (GPS) is a challenging mathematical reasoning task requiring multi-modal understanding, fusion, and reasoning. Existing neural solvers take GPS as a vision-language task but are short in the representation of geometry diagrams that carry rich and complex layout information. In this paper, we propose a layout-aware neural solver named LANS, integrated with two new modules: multimodal layout-aware pre-trained language module (MLA-PLM) and layout-aware fusion attention (LA-FA). MLA-PLM adopts structural-semantic pre-training (SSP) to implement global relationship modeling, and point-match pre-training (PMP) to achieve alignment between visual points and textual points. LA-FA employs a layout-aware attention mask to realize point-guided cross-modal fusion for further boosting layout awareness of LANS. Extensive experiments on datasets Geometry3K and PGPS9K validate the effectiveness of the layout-aware modules and superior problem-solving performance of our LANS solver, over existing symbolic and neural solvers. The code will be made public available soon.
