Table of Contents
Fetching ...

An Empirical Comparison of Cost Functions in Inductive Logic Programming

Céline Hocquette, Andrew Cropper

TL;DR

This work investigates how the choice of cost function influences the generalisation of ILP hypotheses by extending the constraint-based system Popper to learn globally optimal hypotheses under seven standard lexico-linear cost functions. Through a large-scale empirical study spanning over 20 domains and 1000 tasks, the authors show that no single cost function dominates; however, minimising total error (Error/ErrorSize) and the description length (MDL) generally yields the strongest performance, while minimising hypothesis size yields mixed results depending on the domain. The findings reveal domain-dependent effects: FnFp-type costs excel with very limited data (emphasising recall), MDL excels with abundant data (emphasising compressibility), and MDL can struggle in data-sparse settings. The results offer practical guidance for selecting cost functions in ILP, highlight the role of cost-induced bias, and point toward domain-aware strategies and future work on weighted or novel cost functions. Overall, this study provides a rigorous, direct comparison of cost functions in optimal ILP and informs practitioners about when to prefer error-focused, compression-based, or size-aware objectives.

Abstract

Recent inductive logic programming (ILP) approaches learn optimal hypotheses. An optimal hypothesis minimises a given cost function on the training data. There are many cost functions, such as minimising training error, textual complexity, or the description length of hypotheses. However, selecting an appropriate cost function remains a key question. To address this gap, we extend a constraint-based ILP system to learn optimal hypotheses for seven standard cost functions. We then empirically compare the generalisation error of optimal hypotheses induced under these standard cost functions. Our results on over 20 domains and 1000 tasks, including game playing, program synthesis, and image reasoning, show that, while no cost function consistently outperforms the others, minimising training error or description length has the best overall performance. Notably, our results indicate that minimising the size of hypotheses does not always reduce generalisation error.

An Empirical Comparison of Cost Functions in Inductive Logic Programming

TL;DR

This work investigates how the choice of cost function influences the generalisation of ILP hypotheses by extending the constraint-based system Popper to learn globally optimal hypotheses under seven standard lexico-linear cost functions. Through a large-scale empirical study spanning over 20 domains and 1000 tasks, the authors show that no single cost function dominates; however, minimising total error (Error/ErrorSize) and the description length (MDL) generally yields the strongest performance, while minimising hypothesis size yields mixed results depending on the domain. The findings reveal domain-dependent effects: FnFp-type costs excel with very limited data (emphasising recall), MDL excels with abundant data (emphasising compressibility), and MDL can struggle in data-sparse settings. The results offer practical guidance for selecting cost functions in ILP, highlight the role of cost-induced bias, and point toward domain-aware strategies and future work on weighted or novel cost functions. Overall, this study provides a rigorous, direct comparison of cost functions in optimal ILP and informs practitioners about when to prefer error-focused, compression-based, or size-aware objectives.

Abstract

Recent inductive logic programming (ILP) approaches learn optimal hypotheses. An optimal hypothesis minimises a given cost function on the training data. There are many cost functions, such as minimising training error, textual complexity, or the description length of hypotheses. However, selecting an appropriate cost function remains a key question. To address this gap, we extend a constraint-based ILP system to learn optimal hypotheses for seven standard cost functions. We then empirically compare the generalisation error of optimal hypotheses induced under these standard cost functions. Our results on over 20 domains and 1000 tasks, including game playing, program synthesis, and image reasoning, show that, while no cost function consistently outperforms the others, minimising training error or description length has the best overall performance. Notably, our results indicate that minimising the size of hypotheses does not always reduce generalisation error.

Paper Structure

This paper contains 49 sections, 13 equations, 9 figures, 1 table, 1 algorithm.

Figures (9)

  • Figure 1: Ranking of cost functions based on average predictive accuracy across domains. Ties occur when multiple cost functions achieve the same average performance within a domain.
  • Figure 2: Evaluation over all domains. Each dot represents the average performance of all tasks with one domain. The horizontal bar represents the overall mean across all domains.
  • Figure 3: Correlation matrix of the predictive accuracies across cost functions
  • Figure 4: Predictive accuracies versus the number of positive training examples.
  • Figure 5: Evaluation over noisy domains. Each dot represents the average performance of all tasks with one domain. The horizontal bar represents the overall mean across all noisy domains.
  • ...and 4 more figures

Theorems & Definitions (4)

  • Definition 1: ILP input
  • Definition 2: Cost function
  • Definition 3: Optimal hypothesis
  • Definition 4