Table of Contents
Fetching ...

LANS: A Layout-Aware Neural Solver for Plane Geometry Problem

Zhong-Zhi Li, Ming-Liang Zhang, Fei Yin, Cheng-Lin Liu

TL;DR

This paper tackles geometry problem solving by emphasizing the need to preserve and exploit layout information in geometry diagrams. It proposes LANS, a layout-aware neural solver built around MLA-PLM for multimodal pre-training with SSP and PMP, and LA-FA for point-guided cross-modal fusion. The model is validated on Geometry3K and PGPS9K, outperforming state-of-the-art symbolic and neural solvers and several multimodal LLM baselines. The results highlight the value of explicit layout modeling for accurate reasoning in plane geometry tasks.

Abstract

Geometry problem solving (GPS) is a challenging mathematical reasoning task requiring multi-modal understanding, fusion, and reasoning. Existing neural solvers take GPS as a vision-language task but are short in the representation of geometry diagrams that carry rich and complex layout information. In this paper, we propose a layout-aware neural solver named LANS, integrated with two new modules: multimodal layout-aware pre-trained language module (MLA-PLM) and layout-aware fusion attention (LA-FA). MLA-PLM adopts structural-semantic pre-training (SSP) to implement global relationship modeling, and point-match pre-training (PMP) to achieve alignment between visual points and textual points. LA-FA employs a layout-aware attention mask to realize point-guided cross-modal fusion for further boosting layout awareness of LANS. Extensive experiments on datasets Geometry3K and PGPS9K validate the effectiveness of the layout-aware modules and superior problem-solving performance of our LANS solver, over existing symbolic and neural solvers. The code will be made public available soon.

LANS: A Layout-Aware Neural Solver for Plane Geometry Problem

TL;DR

This paper tackles geometry problem solving by emphasizing the need to preserve and exploit layout information in geometry diagrams. It proposes LANS, a layout-aware neural solver built around MLA-PLM for multimodal pre-training with SSP and PMP, and LA-FA for point-guided cross-modal fusion. The model is validated on Geometry3K and PGPS9K, outperforming state-of-the-art symbolic and neural solvers and several multimodal LLM baselines. The results highlight the value of explicit layout modeling for accurate reasoning in plane geometry tasks.

Abstract

Geometry problem solving (GPS) is a challenging mathematical reasoning task requiring multi-modal understanding, fusion, and reasoning. Existing neural solvers take GPS as a vision-language task but are short in the representation of geometry diagrams that carry rich and complex layout information. In this paper, we propose a layout-aware neural solver named LANS, integrated with two new modules: multimodal layout-aware pre-trained language module (MLA-PLM) and layout-aware fusion attention (LA-FA). MLA-PLM adopts structural-semantic pre-training (SSP) to implement global relationship modeling, and point-match pre-training (PMP) to achieve alignment between visual points and textual points. LA-FA employs a layout-aware attention mask to realize point-guided cross-modal fusion for further boosting layout awareness of LANS. Extensive experiments on datasets Geometry3K and PGPS9K validate the effectiveness of the layout-aware modules and superior problem-solving performance of our LANS solver, over existing symbolic and neural solvers. The code will be made public available soon.
Paper Structure (31 sections, 4 equations, 6 figures, 9 tables)

This paper contains 31 sections, 4 equations, 6 figures, 9 tables.

Figures (6)

  • Figure 1: Examples of plane geometry problems. The geometry diagrams (a) and (b) share the same textual problem, structural clauses, and semantic clauses but have different solutions, where structural clauses and semantic clauses are parsed from diagrams. Layout information plays a crucial role in this situation.
  • Figure 2: Overview of LANS model. The red dotted boxes are our newly proposed modules in comparison to PGPSNet Zhang2023.
  • Figure 3: Pipeline of multimodal layout-aware pre-training. The geometry problem is the same as that in Figure \ref{['fig:network']}. [M] denotes mask tokens. Class tags and section tags are the same as Zhang2023.
  • Figure 4: Schematic of Layout-Aware Fusion Attention.
  • Figure 5: Case analysis on PGPS9K. Solving above problems requires layout awareness of geometry diagram. (a), (b) and (c) are the problems LANS answered correctly, (d) is the problem LANS answered incorrectly.
  • ...and 1 more figures