Probabilistic Truly Unordered Rule Sets

Lincen Yang; Matthijs van Leeuwen

Probabilistic Truly Unordered Rule Sets

Lincen Yang, Matthijs van Leeuwen

TL;DR

This work addresses interpretable multiclass classification by proposing Probabilistic Truly Unordered Rule Sets (TURS), which avoid implicit rule orders and handle overlaps through probabilistic union predictions. The authors formalize TURS as a probabilistic model, apply the Minimum Description Length (MDL) principle for model selection, and develop a novel dual-beam, diverse-patience heuristic with MDL-based local testing to learn rule sets efficiently. Empirical results on 31 datasets show TURS achieves competitive ROC-AUC, produces simpler models, and exhibits overlaps formed by similar probabilistic outputs, supporting the truly unordered premise. The approach yields trustworthy per-rule probability estimates that generalize well to unseen data, offering a practical framework for interpretable multiclass rule-based learning with principled uncertainty quantification.

Abstract

Rule set learning has recently been frequently revisited because of its interpretability. Existing methods have several shortcomings though. First, most existing methods impose orders among rules, either explicitly or implicitly, which makes the models less comprehensible. Second, due to the difficulty of handling conflicts caused by overlaps (i.e., instances covered by multiple rules), existing methods often do not consider probabilistic rules. Third, learning classification rules for multi-class target is understudied, as most existing methods focus on binary classification or multi-class classification via the ``one-versus-rest" approach. To address these shortcomings, we propose TURS, for Truly Unordered Rule Sets. To resolve conflicts caused by overlapping rules, we propose a novel model that exploits the probabilistic properties of our rule sets, with the intuition of only allowing rules to overlap if they have similar probabilistic outputs. We next formalize the problem of learning a TURS model based on the MDL principle and develop a carefully designed heuristic algorithm. We benchmark against a wide range of rule-based methods and demonstrate that our method learns rule sets that have lower model complexity and highly competitive predictive performance. In addition, we empirically show that rules in our model are empirically ``independent" and hence truly unordered.

Probabilistic Truly Unordered Rule Sets

TL;DR

Abstract

Paper Structure (31 sections, 4 theorems, 28 equations, 7 figures, 6 tables, 2 algorithms)

This paper contains 31 sections, 4 theorems, 28 equations, 7 figures, 6 tables, 2 algorithms.

Introduction
Related Work
Truly Unordered Rule Sets
Probabilistic rules
The TURS model
Predicting for a new instance
Rule Set Learning as Probabilistic Model Selection
Normalized Maximum Likelihood Distributions for Rule Sets
Approximating the NML Distribution
Code length of model
MDL-based model selection
Learning Truly Unordered Rules from Data
Learning a rule set
Iteratively learning a rule set
Heuristic score for a single rule
...and 16 more sections

Key Result

Proposition 0

Given a rule set $M$ in which for any $S_i, S_j \in M$, $S_i \cap S_j = \emptyset$, then $P^{NML}_{M}(Y^n=y^n|X^n=x^n) = P^{apprNML}_{M}(Y^n=y^n|X^n=x^n)$.

Figures (7)

Figure 1: (Left) Simulated data with a rule set containing two rules (black outlines). (Right) Growing a rule to describe the bottom-right instances will create conflicts with existing rules. E.g., adding either $X_1 > 1$ (vertical purple line) or $X_2 < 0.8$ (horizontal purple line) would create a huge overlap that deteriorates the likelihood.
Figure 2: For each algorithm, we calculate for every individual dataset the difference between its ROC-AUC score and the best ROC-AUC scores. The differences to the best ROC-AUC scores for each algorithm is illustrated by a box-plot.
Figure 3: The weighted average of the differences between the class probability estimates of every individual rule for training and test sets, shown as the empirical cumulative distribution function, in which the weight is defined as the coverage of each rule for the training set.
Figure 4: Empirical cumulative distribution function for the comparative score for model complexity. Curves towards the bottom-right indicate larger comparative scores and simpler models.
Figure 5: The differences between the ROC-AUC scores on the test sets with and without the diverse patience.
...and 2 more figures

Theorems & Definitions (4)

Proposition 0
Proposition 0
Proposition 0
Proposition 0

Probabilistic Truly Unordered Rule Sets

TL;DR

Abstract

Probabilistic Truly Unordered Rule Sets

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (4)