Towards Automated Functional Equation Proving: A Benchmark Dataset and A Domain-Specific In-Context Agent

Mahdi Buali; Robert Hoehndorf

Towards Automated Functional Equation Proving: A Benchmark Dataset and A Domain-Specific In-Context Agent

Mahdi Buali, Robert Hoehndorf

TL;DR

The paper tackles the challenge of automated theorem proving for functional equations by introducing FEAS, an in-context learning agent that extends COPRA within the Lean environment. FEAS enhances prompt engineering, multi-block parsing, and domain-specific heuristics, evaluated on the FunEq dataset, which spans simple, intermediate, and hard (IMO-linked) problems. Empirical results show FEAS improves proof success over baselines on simple and intermediate tiers, with notable demonstrations of high-level strategy generation and error recovery, though hard problems remain largely unsolved. Overall, the work demonstrates that domain-tailored prompting and robust parsing can substantially advance specialized automated proof search and provides a benchmark and agent design to guide further developments in domain-specific ATP.

Abstract

Automated Theorem Proving (ATP) faces challenges due to its complexity and computational demands. Recent work has explored using Large Language Models (LLMs) for ATP action selection, but these methods can be resource-intensive. This study introduces FEAS, an agent that enhances the COPRA in-context learning framework within Lean. FEAS refines prompt generation, response parsing, and incorporates domain-specific heuristics for functional equations. It introduces FunEq, a curated dataset of functional equation problems with varying difficulty. FEAS outperforms baselines on FunEq, particularly with the integration of domain-specific heuristics. The results demonstrate FEAS's effectiveness in generating and formalizing high-level proof strategies into Lean proofs, showcasing the potential of tailored approaches for specific ATP challenges.

Towards Automated Functional Equation Proving: A Benchmark Dataset and A Domain-Specific In-Context Agent

TL;DR

Abstract

Paper Structure (13 sections, 1 figure, 1 table, 1 algorithm)

This paper contains 13 sections, 1 figure, 1 table, 1 algorithm.

Introduction
Related Work
The FunEq Dataset
The FEAS Agent
Prompt Engineering.
Multi-Block Parsing and Error Handling.
Automatic Tactic Application.
Domain-Specific Heuristics.
Evaluation
Results
Conclusion
Appendix
Qualitative Analysis

Figures (1)

Figure 1: Example of FEAS's Prompting and Response Generation for a Functional Equation Problem. (a) An example of domain-specific heuristic included in the system prompt. (b) Natural language proof steps generated by the LLM in response to the current proof state. (c) The corresponding Lean proof generated by the LLM, segmented into blocks for error handling and parsing.

Towards Automated Functional Equation Proving: A Benchmark Dataset and A Domain-Specific In-Context Agent

TL;DR

Abstract

Towards Automated Functional Equation Proving: A Benchmark Dataset and A Domain-Specific In-Context Agent

Authors

TL;DR

Abstract

Table of Contents

Figures (1)