Table of Contents
Fetching ...

REALITrees: Rashomon Ensemble Active Learning for Interpretable Trees

Simon D. Nguyen, Hayden McTavish, Kentaro Hoffman, Cynthia Rudin, Tyler H. McCormick

Abstract

Active learning reduces labeling costs by selecting samples that maximize information gain. A dominant framework, Query-by-Committee (QBC), typically relies on perturbation-based diversity by inducing model disagreement through random feature subsetting or data blinding. While this approximates one notion of epistemic uncertainty, it sacrifices direct characterization of the plausible hypothesis space. We propose the complementary approach: Rashomon Ensembled Active Learning (REAL) which constructs a committee by exhaustively enumerating the Rashomon Set of all near-optimal models. To address functional redundancy within this set, we adopt a PAC-Bayesian framework using a Gibbs posterior to weight committee members by their empirical risk. Leveraging recent algorithmic advances, we exactly enumerate this set for the class of sparse decision trees. Across synthetic and established active learning baselines, REAL outperforms randomized ensembles, particularly in moderately noisy environments where it strategically leverages expanded model multiplicity to achieve faster convergence.

REALITrees: Rashomon Ensemble Active Learning for Interpretable Trees

Abstract

Active learning reduces labeling costs by selecting samples that maximize information gain. A dominant framework, Query-by-Committee (QBC), typically relies on perturbation-based diversity by inducing model disagreement through random feature subsetting or data blinding. While this approximates one notion of epistemic uncertainty, it sacrifices direct characterization of the plausible hypothesis space. We propose the complementary approach: Rashomon Ensembled Active Learning (REAL) which constructs a committee by exhaustively enumerating the Rashomon Set of all near-optimal models. To address functional redundancy within this set, we adopt a PAC-Bayesian framework using a Gibbs posterior to weight committee members by their empirical risk. Leveraging recent algorithmic advances, we exactly enumerate this set for the class of sparse decision trees. Across synthetic and established active learning baselines, REAL outperforms randomized ensembles, particularly in moderately noisy environments where it strategically leverages expanded model multiplicity to achieve faster convergence.
Paper Structure (16 sections, 9 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 16 sections, 9 equations, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 1: (Top) REAL leverages noise-induced version-space diversity to maintain superior performance, with its advantage widening until extreme noise floors ($\phi=0.45$) equalize all strategies. (Bottom) Under total misspecification ($\alpha=1.0$), all tree-based QBC strategies underperform relative to Passive Sampling. However, BREAL and UNREAL demonstrate unique resilience, outperforming the other QBC methods and maintaining early-stage efficiency despite the structural misalignment.
  • Figure 2: Active Learning Efficiency Heatmap ($K=0.7$). This figure displays the Efficiency Ratio ($\rho$) of standard baselines relative to REAL, with values below $1.0$ (Green) indicating superior sample efficiency for our method. The results demonstrate that REAL reaches peak predictive performance significantly faster than randomized ensembles.
  • Figure 3: Relative Label Efficiency ($N_{rel}$) across 20 datasets. Each boxplot summarizes the labeling budget required to achieve $70\%$, $80\%$, and $90\%$ of the total possible accuracy gain. Values to the left of the dashed line ($<1.0$) indicate superior efficiency relative to Random Sampling. Both UNREAL and BREAL maintain a significantly tighter distribution and lower median than standard baselines, demonstrating that characterizing the Rashomon set provides a more reliable and robust path to cost reduction than randomized ensembles or greedy heuristics.
  • Figure 4: Active Learning Benchmarks. Accuracy traces across 20 real-world and structured datasets. UNREAL (Weighted and Uniform) consistently match or exceed baseline performance, demonstrating superior sample efficiency particularly in the early stages of the labeling process.
  • Figure 5: Committee Size History. Evolution of the Rashomon Effective Committee Size over active learning iterations across 20 benchmark datasets. The ECS serves as a metric of model certainty, quantifying the diversity of plausible theories within the version space as the learner explores the data manifold. A high ECS reflects a broad set of competing hypotheses, while values approaching $1.0$ indicate the emergence of a single dominant structural explanation.
  • ...and 4 more figures