Mining Math Conjectures from LLMs: A Pruning Approach
Jake Chuharski, Elias Rojas Collins, Mark Meringolo
TL;DR
The paper demonstrates that large language models can be harnessed to generate and prune mathematical conjectures within the solubilizer area of finite group theory, using a loop that tests conjectures with GAP code and accepts counterexamples to refine the search. It compares three LLMs (ChatGPT, Claude, Gemini) and finds that while many outputs are duplicates or non-executable, a substantial subset yields testable conjectures, with ChatGPT often outperforming in executable conjecture generation and Claude showing strength in counterexample identification. The approach provides a systematic, though still limited, pathway for AI-assisted conjecture formulation, and it highlights the need for improved integration with theorem provers and alternative CAS tools to broaden domain coverage and reliability. Overall, the work offers practical insight into the strengths and constraints of current LLMs for autonomous mathematical idea generation and suggests concrete directions for enhancing future creativity-automation pipelines in mathematics.
Abstract
We present a novel approach to generating mathematical conjectures using Large Language Models (LLMs). Focusing on the solubilizer, a relatively recent construct in group theory, we demonstrate how LLMs such as ChatGPT, Gemini, and Claude can be leveraged to generate conjectures. These conjectures are pruned by allowing the LLMs to generate counterexamples. Our results indicate that LLMs are capable of producing original conjectures that, while not groundbreaking, are either plausible or falsifiable via counterexamples, though they exhibit limitations in code execution.
