Table of Contents
Fetching ...

Probabilistic Truly Unordered Rule Sets

Lincen Yang, Matthijs van Leeuwen

TL;DR

This work addresses interpretable multiclass classification by proposing Probabilistic Truly Unordered Rule Sets (TURS), which avoid implicit rule orders and handle overlaps through probabilistic union predictions. The authors formalize TURS as a probabilistic model, apply the Minimum Description Length (MDL) principle for model selection, and develop a novel dual-beam, diverse-patience heuristic with MDL-based local testing to learn rule sets efficiently. Empirical results on 31 datasets show TURS achieves competitive ROC-AUC, produces simpler models, and exhibits overlaps formed by similar probabilistic outputs, supporting the truly unordered premise. The approach yields trustworthy per-rule probability estimates that generalize well to unseen data, offering a practical framework for interpretable multiclass rule-based learning with principled uncertainty quantification.

Abstract

Rule set learning has recently been frequently revisited because of its interpretability. Existing methods have several shortcomings though. First, most existing methods impose orders among rules, either explicitly or implicitly, which makes the models less comprehensible. Second, due to the difficulty of handling conflicts caused by overlaps (i.e., instances covered by multiple rules), existing methods often do not consider probabilistic rules. Third, learning classification rules for multi-class target is understudied, as most existing methods focus on binary classification or multi-class classification via the ``one-versus-rest" approach. To address these shortcomings, we propose TURS, for Truly Unordered Rule Sets. To resolve conflicts caused by overlapping rules, we propose a novel model that exploits the probabilistic properties of our rule sets, with the intuition of only allowing rules to overlap if they have similar probabilistic outputs. We next formalize the problem of learning a TURS model based on the MDL principle and develop a carefully designed heuristic algorithm. We benchmark against a wide range of rule-based methods and demonstrate that our method learns rule sets that have lower model complexity and highly competitive predictive performance. In addition, we empirically show that rules in our model are empirically ``independent" and hence truly unordered.

Probabilistic Truly Unordered Rule Sets

TL;DR

This work addresses interpretable multiclass classification by proposing Probabilistic Truly Unordered Rule Sets (TURS), which avoid implicit rule orders and handle overlaps through probabilistic union predictions. The authors formalize TURS as a probabilistic model, apply the Minimum Description Length (MDL) principle for model selection, and develop a novel dual-beam, diverse-patience heuristic with MDL-based local testing to learn rule sets efficiently. Empirical results on 31 datasets show TURS achieves competitive ROC-AUC, produces simpler models, and exhibits overlaps formed by similar probabilistic outputs, supporting the truly unordered premise. The approach yields trustworthy per-rule probability estimates that generalize well to unseen data, offering a practical framework for interpretable multiclass rule-based learning with principled uncertainty quantification.

Abstract

Rule set learning has recently been frequently revisited because of its interpretability. Existing methods have several shortcomings though. First, most existing methods impose orders among rules, either explicitly or implicitly, which makes the models less comprehensible. Second, due to the difficulty of handling conflicts caused by overlaps (i.e., instances covered by multiple rules), existing methods often do not consider probabilistic rules. Third, learning classification rules for multi-class target is understudied, as most existing methods focus on binary classification or multi-class classification via the ``one-versus-rest" approach. To address these shortcomings, we propose TURS, for Truly Unordered Rule Sets. To resolve conflicts caused by overlapping rules, we propose a novel model that exploits the probabilistic properties of our rule sets, with the intuition of only allowing rules to overlap if they have similar probabilistic outputs. We next formalize the problem of learning a TURS model based on the MDL principle and develop a carefully designed heuristic algorithm. We benchmark against a wide range of rule-based methods and demonstrate that our method learns rule sets that have lower model complexity and highly competitive predictive performance. In addition, we empirically show that rules in our model are empirically ``independent" and hence truly unordered.
Paper Structure (31 sections, 4 theorems, 28 equations, 7 figures, 6 tables, 2 algorithms)

This paper contains 31 sections, 4 theorems, 28 equations, 7 figures, 6 tables, 2 algorithms.

Key Result

Proposition 0

Given a rule set $M$ in which for any $S_i, S_j \in M$, $S_i \cap S_j = \emptyset$, then $P^{NML}_{M}(Y^n=y^n|X^n=x^n) = P^{apprNML}_{M}(Y^n=y^n|X^n=x^n)$.

Figures (7)

  • Figure 1: (Left) Simulated data with a rule set containing two rules (black outlines). (Right) Growing a rule to describe the bottom-right instances will create conflicts with existing rules. E.g., adding either $X_1 > 1$ (vertical purple line) or $X_2 < 0.8$ (horizontal purple line) would create a huge overlap that deteriorates the likelihood.
  • Figure 2: For each algorithm, we calculate for every individual dataset the difference between its ROC-AUC score and the best ROC-AUC scores. The differences to the best ROC-AUC scores for each algorithm is illustrated by a box-plot.
  • Figure 3: The weighted average of the differences between the class probability estimates of every individual rule for training and test sets, shown as the empirical cumulative distribution function, in which the weight is defined as the coverage of each rule for the training set.
  • Figure 4: Empirical cumulative distribution function for the comparative score for model complexity. Curves towards the bottom-right indicate larger comparative scores and simpler models.
  • Figure 5: The differences between the ROC-AUC scores on the test sets with and without the diverse patience.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Proposition 0
  • Proposition 0
  • Proposition 0
  • Proposition 0