COMET: "Cone of experience" enhanced large multimodal model for mathematical problem generation
Sannyuya Liu, Jintian Feng, Zongkai Yang, Yawei Luo, Qian Wan, Xiaoxuan Shen, Jianwen Sun
TL;DR
COMET introduces a Cone of Experience–guided large multimodal model for mathematical problem generation, unifying stem generation and problem solving within a single model. A three‑stage fine‑tuning framework is designed around symbolic, iconic, and direct experiences, with distinct data construction and injection methods, and a DPO‑based direct experience stage. The approach improves controllable generation, analogy generation, and fine‑grained solving across Chinese multimodal math tasks, validated on GSM8K, TAL‑SCQ5K‑CN, and the new CMM12K dataset, achieving state‑of‑the‑art or near‑state‑of‑the‑art results with a relatively small parameter count. The work contributes a large Chinese multimodal math dataset and demonstrates practical benefits for smart education by advancing end‑to‑end math problem generation within a single, memory‑guided LMM framework.
Abstract
The automatic generation of high-quality mathematical problems is practically valuable in many educational scenarios. Large multimodal model provides a novel technical approach for the mathematical problem generation because of its wide success in cross-modal data scenarios. However, the traditional method of separating problem solving from problem generation and the mainstream fine-tuning framework of monotonous data structure with homogeneous training objectives limit the application of large multimodal model in mathematical problem generation. Addressing these challenges, this paper proposes COMET, a "Cone of Experience" enhanced large multimodal model for mathematical problem generation. Firstly, from the perspective of mutual ability promotion and application logic, we unify stem generation and problem solving into mathematical problem generation. Secondly, a three-stage fine-turning framework guided by the "Cone of Experience" is proposed. The framework divides the fine-tuning data into symbolic experience, iconic experience, and direct experience to draw parallels with experiences in the career growth of teachers. Several fine-grained data construction and injection methods are designed in this framework. Finally, we construct a Chinese multimodal mathematical problem dataset to fill the vacancy of Chinese multimodal data in this field. Combined with objective and subjective indicators, experiments on multiple datasets fully verify the effectiveness of the proposed framework and model.
