CON-FOLD -- Explainable Machine Learning with Confidence
Lachlan McGinness, Peter Baumgartner
TL;DR
CON-FOLD extends the FOLD-RM framework to provide probability-based confidence scores for learned rules using the Wilson score interval, enabling reliable interpretation and pruning of rule sets. It introduces two pruning strategies (Improvement Threshold and Confidence Threshold) to reduce overfitting and complexity, and supports incorporating background or initial domain knowledge into the logic-programming model. The paper formalizes the learning framework, defines Inverse Brier Score (IBS) as a proper probabilistic performance metric, and demonstrates improvements on UCI benchmarks and a physics marking task that benefits from domain knowledge. This work advances explainable AI by enhancing trust, compactness, and applicability of rule-based classifiers, particularly in data-scarce scenarios. The results indicate CON-FOLD can outperform baselines like XGBoost in certain settings and offers a practical pathway for interpretable, knowledge-augmented ML in real-world tasks.
Abstract
FOLD-RM is an explainable machine learning classification algorithm that uses training data to create a set of classification rules. In this paper we introduce CON-FOLD which extends FOLD-RM in several ways. CON-FOLD assigns probability-based confidence scores to rules learned for a classification task. This allows users to know how confident they should be in a prediction made by the model. We present a confidence-based pruning algorithm that uses the unique structure of FOLD-RM rules to efficiently prune rules and prevent overfitting. Furthermore, CON-FOLD enables the user to provide pre-existing knowledge in the form of logic program rules that are either (fixed) background knowledge or (modifiable) initial rule candidates. The paper describes our method in detail and reports on practical experiments. We demonstrate the performance of the algorithm on benchmark datasets from the UCI Machine Learning Repository. For that, we introduce a new metric, Inverse Brier Score, to evaluate the accuracy of the produced confidence scores. Finally we apply this extension to a real world example that requires explainability: marking of student responses to a short answer question from the Australian Physics Olympiad.
