Table of Contents
Fetching ...

Solving Hard Mizar Problems with Instantiation and Strategy Invention

Jan Jakubův, Mikoláš Janota, Josef Urban

TL;DR

This work tackles extending automated proof coverage for Mizar's large-theory problems by leveraging instantiation-based methods within the cvc5 SMT/ATP framework. It introduces Grackle, an automated strategy-invention system that crafts diverse cvc5 strategies, and augments them with alternative clausification and premise-selection techniques to maximize problem-solving on challenging Mizar instances. The results show a substantial gain: 3021 additional ATP-proved problems (21.3% of the ATP-unproved set), boosting the overall ATP-proved Mizar fraction to 80.7% and demonstrating notable improvements over prior cvc5 configurations. The findings highlight the effectiveness of combining instantiation-based SMT methods with automated strategy discovery and problem transformations, with implications for cross-benchmark applications and future AI-assisted theorem-proving research.

Abstract

In this work, we prove over 3000 previously ATP-unproved Mizar/MPTP problems by using several ATP and AI methods, raising the number of ATP-solved Mizar problems from 75\% to above 80\%. First, we start to experiment with the cvc5 SMT solver which uses several instantiation-based heuristics that differ from the superposition-based systems, that were previously applied to Mizar,and add many new solutions. Then we use automated strategy invention to develop cvc5 strategies that largely improve cvc5's performance on the hard problems. In particular, the best invented strategy solves over 14\% more problems than the best previously available cvc5 strategy. We also show that different clausification methods have a high impact on such instantiation-based methods, again producing many new solutions. In total, the methods solve 3021 (21.3\%) of the 14163 previously unsolved hard Mizar problems. This is a new milestone over the Mizar large-theory benchmark and a large strengthening of the hammer methods for Mizar.

Solving Hard Mizar Problems with Instantiation and Strategy Invention

TL;DR

This work tackles extending automated proof coverage for Mizar's large-theory problems by leveraging instantiation-based methods within the cvc5 SMT/ATP framework. It introduces Grackle, an automated strategy-invention system that crafts diverse cvc5 strategies, and augments them with alternative clausification and premise-selection techniques to maximize problem-solving on challenging Mizar instances. The results show a substantial gain: 3021 additional ATP-proved problems (21.3% of the ATP-unproved set), boosting the overall ATP-proved Mizar fraction to 80.7% and demonstrating notable improvements over prior cvc5 configurations. The findings highlight the effectiveness of combining instantiation-based SMT methods with automated strategy discovery and problem transformations, with implications for cross-benchmark applications and future AI-assisted theorem-proving research.

Abstract

In this work, we prove over 3000 previously ATP-unproved Mizar/MPTP problems by using several ATP and AI methods, raising the number of ATP-solved Mizar problems from 75\% to above 80\%. First, we start to experiment with the cvc5 SMT solver which uses several instantiation-based heuristics that differ from the superposition-based systems, that were previously applied to Mizar,and add many new solutions. Then we use automated strategy invention to develop cvc5 strategies that largely improve cvc5's performance on the hard problems. In particular, the best invented strategy solves over 14\% more problems than the best previously available cvc5 strategy. We also show that different clausification methods have a high impact on such instantiation-based methods, again producing many new solutions. In total, the methods solve 3021 (21.3\%) of the 14163 previously unsolved hard Mizar problems. This is a new milestone over the Mizar large-theory benchmark and a large strengthening of the hammer methods for Mizar.
Paper Structure (15 sections, 2 figures, 6 tables)