Table of Contents
Fetching ...

Symbolic Snapshot Ensembles

Mingyue Liu, Andrew Cropper

TL;DR

Symbolic Snapshot Ensembles address the ILP limitation of learning a single hypothesis by harvesting intermediate hypotheses from one anytime ILP run. These hypotheses are pooled and weighted using a minimum description length ($MDL$) framework to balance fit and complexity, producing a weighted ensemble that predicts with a 0.5 threshold. Across 111 tasks covering game playing and visual reasoning, the approach yields about a $4\%$ accuracy gain with under $1\%$ additional cost, and often matches or surpasses traditional bagging while being far more computationally efficient. The work demonstrates that diverse, structurally distinct rule sets generated during a single search can be effectively aggregated to boost generalisation in symbolic learning.

Abstract

Inductive logic programming (ILP) is a form of logical machine learning. Most ILP algorithms learn a single hypothesis from a single training run. Ensemble methods train an ILP algorithm multiple times to learn multiple hypotheses. In this paper, we train an ILP algorithm only once and save intermediate hypotheses. We then combine the hypotheses using a minimum description length weighting scheme. Our experiments on multiple benchmarks, including game playing and visual reasoning, show that our approach improves predictive accuracy by 4% with less than 1% computational overhead.

Symbolic Snapshot Ensembles

TL;DR

Symbolic Snapshot Ensembles address the ILP limitation of learning a single hypothesis by harvesting intermediate hypotheses from one anytime ILP run. These hypotheses are pooled and weighted using a minimum description length () framework to balance fit and complexity, producing a weighted ensemble that predicts with a 0.5 threshold. Across 111 tasks covering game playing and visual reasoning, the approach yields about a accuracy gain with under additional cost, and often matches or surpasses traditional bagging while being far more computationally efficient. The work demonstrates that diverse, structurally distinct rule sets generated during a single search can be effectively aggregated to boost generalisation in symbolic learning.

Abstract

Inductive logic programming (ILP) is a form of logical machine learning. Most ILP algorithms learn a single hypothesis from a single training run. Ensemble methods train an ILP algorithm multiple times to learn multiple hypotheses. In this paper, we train an ILP algorithm only once and save intermediate hypotheses. We then combine the hypotheses using a minimum description length weighting scheme. Our experiments on multiple benchmarks, including game playing and visual reasoning, show that our approach improves predictive accuracy by 4% with less than 1% computational overhead.

Paper Structure

This paper contains 29 sections, 8 equations, 4 figures, 2 tables, 2 algorithms.

Figures (4)

  • Figure 1: Task-level change in accuracy (our symbolic ensemble's predictive accuracy minus the baseline's) across datasets.
  • Figure 2: Per-task differences in accuracy between (1) the best individual snapshot (test-optimal) and the snapshot ensemble, and (2) the worst individual hypothesis and the snapshot ensemble, under three cost functions.
  • Figure 3: Snapshot improvement vs. overfit gap with linear regression line.
  • Figure 4: Task-level difference in accuracy (our symbolic ensemble's predictive accuracy minus the bagging's) across datasets.

Theorems & Definitions (6)

  • Definition 1: ILP input
  • Definition 2: Cost function
  • Definition 3: ILP problem
  • Definition 4: Ensemble ILP input
  • Definition 5: Ensemble cost function
  • Definition 6: Ensemble ILP problem