Table of Contents
Fetching ...

Towards Automated Functional Equation Proving: A Benchmark Dataset and A Domain-Specific In-Context Agent

Mahdi Buali, Robert Hoehndorf

TL;DR

The paper tackles the challenge of automated theorem proving for functional equations by introducing FEAS, an in-context learning agent that extends COPRA within the Lean environment. FEAS enhances prompt engineering, multi-block parsing, and domain-specific heuristics, evaluated on the FunEq dataset, which spans simple, intermediate, and hard (IMO-linked) problems. Empirical results show FEAS improves proof success over baselines on simple and intermediate tiers, with notable demonstrations of high-level strategy generation and error recovery, though hard problems remain largely unsolved. Overall, the work demonstrates that domain-tailored prompting and robust parsing can substantially advance specialized automated proof search and provides a benchmark and agent design to guide further developments in domain-specific ATP.

Abstract

Automated Theorem Proving (ATP) faces challenges due to its complexity and computational demands. Recent work has explored using Large Language Models (LLMs) for ATP action selection, but these methods can be resource-intensive. This study introduces FEAS, an agent that enhances the COPRA in-context learning framework within Lean. FEAS refines prompt generation, response parsing, and incorporates domain-specific heuristics for functional equations. It introduces FunEq, a curated dataset of functional equation problems with varying difficulty. FEAS outperforms baselines on FunEq, particularly with the integration of domain-specific heuristics. The results demonstrate FEAS's effectiveness in generating and formalizing high-level proof strategies into Lean proofs, showcasing the potential of tailored approaches for specific ATP challenges.

Towards Automated Functional Equation Proving: A Benchmark Dataset and A Domain-Specific In-Context Agent

TL;DR

The paper tackles the challenge of automated theorem proving for functional equations by introducing FEAS, an in-context learning agent that extends COPRA within the Lean environment. FEAS enhances prompt engineering, multi-block parsing, and domain-specific heuristics, evaluated on the FunEq dataset, which spans simple, intermediate, and hard (IMO-linked) problems. Empirical results show FEAS improves proof success over baselines on simple and intermediate tiers, with notable demonstrations of high-level strategy generation and error recovery, though hard problems remain largely unsolved. Overall, the work demonstrates that domain-tailored prompting and robust parsing can substantially advance specialized automated proof search and provides a benchmark and agent design to guide further developments in domain-specific ATP.

Abstract

Automated Theorem Proving (ATP) faces challenges due to its complexity and computational demands. Recent work has explored using Large Language Models (LLMs) for ATP action selection, but these methods can be resource-intensive. This study introduces FEAS, an agent that enhances the COPRA in-context learning framework within Lean. FEAS refines prompt generation, response parsing, and incorporates domain-specific heuristics for functional equations. It introduces FunEq, a curated dataset of functional equation problems with varying difficulty. FEAS outperforms baselines on FunEq, particularly with the integration of domain-specific heuristics. The results demonstrate FEAS's effectiveness in generating and formalizing high-level proof strategies into Lean proofs, showcasing the potential of tailored approaches for specific ATP challenges.
Paper Structure (13 sections, 1 figure, 1 table, 1 algorithm)

This paper contains 13 sections, 1 figure, 1 table, 1 algorithm.

Figures (1)

  • Figure 1: Example of FEAS's Prompting and Response Generation for a Functional Equation Problem. (a) An example of domain-specific heuristic included in the system prompt. (b) Natural language proof steps generated by the LLM in response to the current proof state. (c) The corresponding Lean proof generated by the LLM, segmented into blocks for error handling and parsing.