Table of Contents
Fetching ...

COMET: "Cone of experience" enhanced large multimodal model for mathematical problem generation

Sannyuya Liu, Jintian Feng, Zongkai Yang, Yawei Luo, Qian Wan, Xiaoxuan Shen, Jianwen Sun

TL;DR

COMET introduces a Cone of Experience–guided large multimodal model for mathematical problem generation, unifying stem generation and problem solving within a single model. A three‑stage fine‑tuning framework is designed around symbolic, iconic, and direct experiences, with distinct data construction and injection methods, and a DPO‑based direct experience stage. The approach improves controllable generation, analogy generation, and fine‑grained solving across Chinese multimodal math tasks, validated on GSM8K, TAL‑SCQ5K‑CN, and the new CMM12K dataset, achieving state‑of‑the‑art or near‑state‑of‑the‑art results with a relatively small parameter count. The work contributes a large Chinese multimodal math dataset and demonstrates practical benefits for smart education by advancing end‑to‑end math problem generation within a single, memory‑guided LMM framework.

Abstract

The automatic generation of high-quality mathematical problems is practically valuable in many educational scenarios. Large multimodal model provides a novel technical approach for the mathematical problem generation because of its wide success in cross-modal data scenarios. However, the traditional method of separating problem solving from problem generation and the mainstream fine-tuning framework of monotonous data structure with homogeneous training objectives limit the application of large multimodal model in mathematical problem generation. Addressing these challenges, this paper proposes COMET, a "Cone of Experience" enhanced large multimodal model for mathematical problem generation. Firstly, from the perspective of mutual ability promotion and application logic, we unify stem generation and problem solving into mathematical problem generation. Secondly, a three-stage fine-turning framework guided by the "Cone of Experience" is proposed. The framework divides the fine-tuning data into symbolic experience, iconic experience, and direct experience to draw parallels with experiences in the career growth of teachers. Several fine-grained data construction and injection methods are designed in this framework. Finally, we construct a Chinese multimodal mathematical problem dataset to fill the vacancy of Chinese multimodal data in this field. Combined with objective and subjective indicators, experiments on multiple datasets fully verify the effectiveness of the proposed framework and model.

COMET: "Cone of experience" enhanced large multimodal model for mathematical problem generation

TL;DR

COMET introduces a Cone of Experience–guided large multimodal model for mathematical problem generation, unifying stem generation and problem solving within a single model. A three‑stage fine‑tuning framework is designed around symbolic, iconic, and direct experiences, with distinct data construction and injection methods, and a DPO‑based direct experience stage. The approach improves controllable generation, analogy generation, and fine‑grained solving across Chinese multimodal math tasks, validated on GSM8K, TAL‑SCQ5K‑CN, and the new CMM12K dataset, achieving state‑of‑the‑art or near‑state‑of‑the‑art results with a relatively small parameter count. The work contributes a large Chinese multimodal math dataset and demonstrates practical benefits for smart education by advancing end‑to‑end math problem generation within a single, memory‑guided LMM framework.

Abstract

The automatic generation of high-quality mathematical problems is practically valuable in many educational scenarios. Large multimodal model provides a novel technical approach for the mathematical problem generation because of its wide success in cross-modal data scenarios. However, the traditional method of separating problem solving from problem generation and the mainstream fine-tuning framework of monotonous data structure with homogeneous training objectives limit the application of large multimodal model in mathematical problem generation. Addressing these challenges, this paper proposes COMET, a "Cone of Experience" enhanced large multimodal model for mathematical problem generation. Firstly, from the perspective of mutual ability promotion and application logic, we unify stem generation and problem solving into mathematical problem generation. Secondly, a three-stage fine-turning framework guided by the "Cone of Experience" is proposed. The framework divides the fine-tuning data into symbolic experience, iconic experience, and direct experience to draw parallels with experiences in the career growth of teachers. Several fine-grained data construction and injection methods are designed in this framework. Finally, we construct a Chinese multimodal mathematical problem dataset to fill the vacancy of Chinese multimodal data in this field. Combined with objective and subjective indicators, experiments on multiple datasets fully verify the effectiveness of the proposed framework and model.
Paper Structure (22 sections, 7 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 22 sections, 7 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: The diagram of mathematical problem generation and the "Cone of Experience" guided model fine-tuning.
  • Figure 2: The diagram of the three-stage fine-tuning framework.
  • Figure 3: Prompt of the three tasks.
  • Figure 4: Performance of our model and baselines on a broad range of problem-solving and generation fine-grained indicators. (a) The average score on 15 indicators in three tasks(CG, AG, and FS); (b) The rank of models' scores in each evaluation indicator. Here, each evaluation indicator is expressed as 'task-dimension', and the number in the radar chart refers to the average score of models, we only present the scores of our model and Yi-VL-34B for visualization purposes. For instance, the label 'CG-DA' depicted at the bottom of the radar chart denotes the difficulty appropriateness (DA) measure for the controllable generation (CG) task, where our model attains scores of 8.57 (rank 1), while Yi-VL-34B obtains 7.89 (rank 2).
  • Figure 5: The statistics of ELO rating over $3,600$ rounds and the win rate between models. The three subfigures, (a), (b), and (c), respectively represent tasks FS, CG, and AG. For each subfigure, the top section represents the ELO rating, here we sorted the models based on their ELO rating medians, while the bottom section represents the win rate. Abbreviations for Yi-VL-6B/34B, LLaVA1.6, and Qwen-VL-Chat, denoted as Yi-6/34, LLaVA, and Qwen respectively.
  • ...and 2 more figures