Table of Contents
Fetching ...

Regularization in Spider-Style Strategy Discovery and Schedule Construction

Filip Bártek, Karel Chvalovský, Martin Suda

TL;DR

The paper investigates Spider-style strategy discovery and schedule construction for Vampire on the FOF subset of the TPTP library, demonstrating that a large, diverse pool of strategies can be leveraged to build effective schedules. It shows that a polynomial-time greedy schedule Construction (SCP) can rival or exceed optimal schedules in practice, especially when augmented with regularization techniques that improve generalization to unseen problems. The study provides detailed experimental results across multiple budgets, reveals the trade-offs between training and test performance, and highlights temporal reward and slice-extension regularizations as particularly effective. The findings suggest that robust, fast, and reusable scheduling methods can greatly improve practical theorem proving performance and generalization without resorting to expensive optimal solvers. The work also contributes a dataset and methodology to facilitate further research in strategy discovery and scheduling for ATPs.

Abstract

To achieve the best performance, automatic theorem provers often rely on schedules of diverse proving strategies to be tried out (either sequentially or in parallel) on a given problem. In this paper, we report on a large-scale experiment with discovering strategies for the Vampire prover, targeting the FOF fragment of the TPTP library and constructing a schedule for it, based on the ideas of Andrei Voronkov's system Spider. We examine the process from various angles, discuss the difficulty (or ease) of obtaining a strong Vampire schedule for the CASC competition, and establish how well a schedule can be expected to generalize to unseen problems and what factors influence this property.

Regularization in Spider-Style Strategy Discovery and Schedule Construction

TL;DR

The paper investigates Spider-style strategy discovery and schedule construction for Vampire on the FOF subset of the TPTP library, demonstrating that a large, diverse pool of strategies can be leveraged to build effective schedules. It shows that a polynomial-time greedy schedule Construction (SCP) can rival or exceed optimal schedules in practice, especially when augmented with regularization techniques that improve generalization to unseen problems. The study provides detailed experimental results across multiple budgets, reveals the trade-offs between training and test performance, and highlights temporal reward and slice-extension regularizations as particularly effective. The findings suggest that robust, fast, and reusable scheduling methods can greatly improve practical theorem proving performance and generalization without resorting to expensive optimal solvers. The work also contributes a dataset and methodology to facilitate further research in strategy discovery and scheduling for ATPs.

Abstract

To achieve the best performance, automatic theorem provers often rely on schedules of diverse proving strategies to be tried out (either sequentially or in parallel) on a given problem. In this paper, we report on a large-scale experiment with discovering strategies for the Vampire prover, targeting the FOF fragment of the TPTP library and constructing a schedule for it, based on the ideas of Andrei Voronkov's system Spider. We examine the process from various angles, discuss the difficulty (or ease) of obtaining a strong Vampire schedule for the CASC competition, and establish how well a schedule can be expected to generalize to unseen problems and what factors influence this property.
Paper Structure (33 sections, 1 equation, 8 figures, 2 tables, 1 algorithm)

This paper contains 33 sections, 1 equation, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: Strategy discovery. Left: problem coverage growth in time (uniform strategy sampling distribution vs. an updated one). Right: collected strategies ordered by limit (2000.0, 4000.0, …, 256000mi) and, secondarily, by how many problems can each solve.
  • Figure 2: Strategy optimization scatter plots. Left: time needed to solve strategy's witness problem (a log--log plot). Right: the total number of problems (in thousands) solved.
  • Figure 3: Cumulative performance of several greedy schedules, each using a subset of the discovered strategies as gathered in time, compared with Vampire's default strategy
  • Figure 4: Train and test performance of various regularizations of the greedy schedule construction algorithm for the budget 64000mi. Performance is the mean number of problems solved out of 7866.0 across 50 splits. The label of each point denotes the value of the respective regularization parameter.
  • Figure 5: Train and test performance of various regularizations of the greedy schedule construction algorithm for the budgets 256000mi (top), 64000mi (middle), and 16000mi (bottom). Performance is mean number of problems solved out of 7866 across 50 splits. The label of each point denotes the value of the respective regularization parameter.
  • ...and 3 more figures