An Empirical Comparison of Cost Functions in Inductive Logic Programming
Céline Hocquette, Andrew Cropper
TL;DR
This work investigates how the choice of cost function influences the generalisation of ILP hypotheses by extending the constraint-based system Popper to learn globally optimal hypotheses under seven standard lexico-linear cost functions. Through a large-scale empirical study spanning over 20 domains and 1000 tasks, the authors show that no single cost function dominates; however, minimising total error (Error/ErrorSize) and the description length (MDL) generally yields the strongest performance, while minimising hypothesis size yields mixed results depending on the domain. The findings reveal domain-dependent effects: FnFp-type costs excel with very limited data (emphasising recall), MDL excels with abundant data (emphasising compressibility), and MDL can struggle in data-sparse settings. The results offer practical guidance for selecting cost functions in ILP, highlight the role of cost-induced bias, and point toward domain-aware strategies and future work on weighted or novel cost functions. Overall, this study provides a rigorous, direct comparison of cost functions in optimal ILP and informs practitioners about when to prefer error-focused, compression-based, or size-aware objectives.
Abstract
Recent inductive logic programming (ILP) approaches learn optimal hypotheses. An optimal hypothesis minimises a given cost function on the training data. There are many cost functions, such as minimising training error, textual complexity, or the description length of hypotheses. However, selecting an appropriate cost function remains a key question. To address this gap, we extend a constraint-based ILP system to learn optimal hypotheses for seven standard cost functions. We then empirically compare the generalisation error of optimal hypotheses induced under these standard cost functions. Our results on over 20 domains and 1000 tasks, including game playing, program synthesis, and image reasoning, show that, while no cost function consistently outperforms the others, minimising training error or description length has the best overall performance. Notably, our results indicate that minimising the size of hypotheses does not always reduce generalisation error.
