Table of Contents
Fetching ...

Assessing the Creativity of LLMs in Proposing Novel Solutions to Mathematical Problems

Junyi Ye, Jingyi Gu, Xinyun Zhao, Wenpeng Yin, Guiling Wang

TL;DR

This study explores the creative potential of Large Language Models in mathematical reasoning, an aspect that has received limited attention in prior research, and introduces a novel framework and benchmark, CreativeMath, designed to assess LLMs' ability to propose innovative solutions after some known solutions have been provided.

Abstract

The mathematical capabilities of AI systems are complex and multifaceted. Most existing research has predominantly focused on the correctness of AI-generated solutions to mathematical problems. In this work, we argue that beyond producing correct answers, AI systems should also be capable of, or assist humans in, developing novel solutions to mathematical challenges. This study explores the creative potential of Large Language Models (LLMs) in mathematical reasoning, an aspect that has received limited attention in prior research. We introduce a novel framework and benchmark, CreativeMath, which encompasses problems ranging from middle school curricula to Olympic-level competitions, designed to assess LLMs' ability to propose innovative solutions after some known solutions have been provided. Our experiments demonstrate that, while LLMs perform well on standard mathematical tasks, their capacity for creative problem-solving varies considerably. Notably, the Gemini-1.5-Pro model outperformed other LLMs in generating novel solutions. This research opens a new frontier in evaluating AI creativity, shedding light on both the strengths and limitations of LLMs in fostering mathematical innovation, and setting the stage for future developments in AI-assisted mathematical discovery.

Assessing the Creativity of LLMs in Proposing Novel Solutions to Mathematical Problems

TL;DR

This study explores the creative potential of Large Language Models in mathematical reasoning, an aspect that has received limited attention in prior research, and introduces a novel framework and benchmark, CreativeMath, designed to assess LLMs' ability to propose innovative solutions after some known solutions have been provided.

Abstract

The mathematical capabilities of AI systems are complex and multifaceted. Most existing research has predominantly focused on the correctness of AI-generated solutions to mathematical problems. In this work, we argue that beyond producing correct answers, AI systems should also be capable of, or assist humans in, developing novel solutions to mathematical challenges. This study explores the creative potential of Large Language Models (LLMs) in mathematical reasoning, an aspect that has received limited attention in prior research. We introduce a novel framework and benchmark, CreativeMath, which encompasses problems ranging from middle school curricula to Olympic-level competitions, designed to assess LLMs' ability to propose innovative solutions after some known solutions have been provided. Our experiments demonstrate that, while LLMs perform well on standard mathematical tasks, their capacity for creative problem-solving varies considerably. Notably, the Gemini-1.5-Pro model outperformed other LLMs in generating novel solutions. This research opens a new frontier in evaluating AI creativity, shedding light on both the strengths and limitations of LLMs in fostering mathematical innovation, and setting the stage for future developments in AI-assisted mathematical discovery.

Paper Structure

This paper contains 25 sections, 6 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Distribution of problems across different math categories and competitions in the CreativeMath dataset.
  • Figure 2: Distribution of the number of solutions per problem across different competitions.
  • Figure 3: The framework includes solution generation (left) and the evaluation pipeline (middle). The flowchart of the detailed evaluation pipeline is illustrated on the right.
  • Figure 4: The prompt template for generating novel solution.
  • Figure 5: The prompt templates for evaluating the correctness (top) and novelty (bottom) of the generated solution. The criteria for evaluating the novelty are rephrased from the same criteria applied during the novel solution generation process to ensure alignment.
  • ...and 1 more figures