How RL Unlocks the Aha Moment in Geometric Interleaved Reasoning

Xiangxiang Zhang; Caijun Jia; Siyuan Li; Dingyu He; Xiya Xiong; Zheng Sun; Honghao He; Yuchen Wu; Bihui Yu; Linzhuang Sun; Cheng Tan; Jingxuan Wei

How RL Unlocks the Aha Moment in Geometric Interleaved Reasoning

Xiangxiang Zhang, Caijun Jia, Siyuan Li, Dingyu He, Xiya Xiong, Zheng Sun, Honghao He, Yuchen Wu, Bihui Yu, Linzhuang Sun, Cheng Tan, Jingxuan Wei

TL;DR

Faire (Functional alignment for interleaved reasoning), a reinforcement learning framework that enforces three casual constraints to move beyond superficial imitation toward functional alignment, is proposed, which induces a qualitative shift in model behavior in which the plotting is effectively internalized, yielding competitive performance on challenging geometric reasoning benchmarks.

Abstract

Solving complex geometric problems inherently requires interleaved reasoning: a tight alternation between constructing diagrams and performing logical deductions. Although recent Multimodal Large Language Models (MLLMs) have demonstrated strong capabilities in visual generation and plotting, we identify a counter-intuitive and underexplored phenomenon. Naively applying Supervised Fine-Tuning (SFT) on interleaved plot-solution data leads to a substantial degradation in reasoning performance compared to text-only baselines. We argue that this failure stems from a fundamental limitation of SFT, which primarily induces distributional alignment: the model learns to reproduce the surface format of interleaved plotting but fails to internalize the causal dependency between the generated plot and reasoning steps. To overcome this limitation, we propose Faire (Functional alignment for interleaved reasoning), a reinforcement learning framework that enforces three casual constraints to move beyond superficial imitation toward functional alignment. Extensive experiments show that Faire induces a qualitative shift in model behavior in which the plotting is effectively internalized, yielding competitive performance on challenging geometric reasoning benchmarks.

How RL Unlocks the Aha Moment in Geometric Interleaved Reasoning

TL;DR

Abstract

Paper Structure (39 sections, 13 equations, 17 figures, 7 tables)

This paper contains 39 sections, 13 equations, 17 figures, 7 tables.

Introduction
Related Work
Multimodal Geometry Reasoning
Interleaved Reasoning and Unified Generation
Method
Preliminaries
Distributional Alignment Leads to Failure
Functional Alignment for Causal Dependency
Necessary conditions for causal mediation
Completeness of Verification
Optimization
Dataset
Dataset Construction
Dataset analysis
Experiment
...and 24 more sections

Figures (17)

Figure 1: Illustration of challenges of geometric interleaved reasoning on several sub-tasks between SFT-Text only and SFT-Interleaved, which can only be tackled by RL post-training.
Figure 2: Illustration of Faire framework. The model generates a reasoning trace and GeoGebra code from a geometry problem. As for reward designs, a gated reward enforces answer correctness $C_{ans}$ and code executability $C_{exe}$, then aggregates perceptual $C_{perc}$, semantic $C_{sem}$, and formal verification signals $C_{geo}$.
Figure 3: The data construction pipeline employing a tri-perspective verification mechanism—Visual Alignment, Semantic Consistency, and Geometric Assertion—to curate rigorous interleaved geometric reasoning samples.
Figure 4: Stage and category distribution. Inner ring shows educational stages; outer ring shows category shares within each stage.
Figure 5: Distribution of top-100 entropy-increased tokens after RL. Tokens are grouped by semantic function.
...and 12 more figures

How RL Unlocks the Aha Moment in Geometric Interleaved Reasoning

TL;DR

Abstract

How RL Unlocks the Aha Moment in Geometric Interleaved Reasoning

Authors

TL;DR

Abstract

Table of Contents

Figures (17)