Decomposing Observational Multiplicity in Decision Trees: Leaf and Structural Regret

Mustafa Cavus

Decomposing Observational Multiplicity in Decision Trees: Leaf and Structural Regret

Mustafa Cavus

Abstract

Many machine learning tasks admit multiple models that perform almost equally well, a phenomenon known as predictive multiplicity. A fundamental source of this multiplicity is observational multiplicity, which arises from the stochastic nature of label collection: observed training labels represent only a single realization of the underlying ground-truth probabilities. While theoretical frameworks for observational multiplicity have been established for logistic regression, their implications for non-smooth, partition-based models like decision trees remain underexplored. In this paper, we introduce two complementary notions of observational multiplicity for decision tree classifiers: leaf regret and structural regret. Leaf regret quantifies the intrinsic variability of predictions within a fixed leaf due to finite-sample noise, while structural regret captures variability induced by the instability of the learned tree structure itself. We provide a formal decomposition of observational multiplicity into these two components and establish statistical guarantees. Our experimental evaluation across diverse credit risk scoring datasets confirms the near-perfect alignment between our theoretical decomposition and the empirically observed variance. Notably, we find that structural regret is the primary driver of observational multiplicity, accounting for over 15 times the variability of leaf regret in some datasets. Furthermore, we demonstrate that utilizing these regret measures as an abstention mechanism in selective prediction can effectively identify arbitrary regions and improve model safety, elevating recall from 92% to 100% on the most stable sub-populations. These results establish a rigorous framework for quantifying observational multiplicity, aligning with recent advances in algorithmic safety and interpretability.

Decomposing Observational Multiplicity in Decision Trees: Leaf and Structural Regret

Abstract

Paper Structure (17 sections, 10 theorems, 6 equations, 3 figures, 1 table, 2 algorithms)

This paper contains 17 sections, 10 theorems, 6 equations, 3 figures, 1 table, 2 algorithms.

Introduction
Related Work
Methodology
Setup and notation
Leaf regret
Estimation of leaf regret
Asymptotic behavior
Monte Carlo approximation of leaf regret
Structural regret
Experiments
Numerical Validation of Lemma 6
Empirical Validation of Lemma 2 and Theorem 1
Comparative Regret Analysis
Selective Prediction and Safety Promotion
Discussion
...and 2 more sections

Key Result

lemma 1

For any leaf $L$ with $n_L \ge 1$, the quantity $R_L^{\mathrm{leaf}} := \mathrm{Var}(\hat{p}_L \mid L)$ is finite and admits the closed-form expression $R_L^{\mathrm{leaf}} = \frac{p_L^\ast (1 - p_L^\ast)}{n_L}$.

Figures (3)

Figure 1: Theorized versus actual regret in decision trees across three datasets. The x-axis represents the sum of expected leaf regret and structural regret, while the y-axis shows the simulated true variance.
Figure 2: The impact of minimum leaf size ($n_L$) on predictive stability and model performance. The dual-axis plot illustrates the trade-off between Leaf Regret and Logistic Loss. These empirical results confirm that increasing partition size effectively mitigates observational multiplicity, as predicted in Lemma 2, while the increase in Logistic Loss reflects an empirical underfitting trade-off.
Figure 3: Selective Prediction: Recall vs. Coverage across six datasets. The x-axis is reversed, moving from full dataset utilization as 100% to selective prediction on the most stable individuals. Curves represent ranking strategies based on Leaf Regret, Structural Regret, and Total Regret.

Theorems & Definitions (11)

lemma 1: Well-definedness of leaf regret
lemma 2: Uniform upper bound
lemma 3: Consistency of the plug-in estimator
lemma 4: Deviation inequality
theorem 1: Asymptotic vanishing of leaf regret
corollary 1: Expected leaf regret bound
lemma 5: Consistency of the Monte Carlo estimator
theorem 2: Two-stage convergence of Monte Carlo leaf regret
definition 1: Structural regret
lemma 6: Decomposition of predictive variability
...and 1 more

Decomposing Observational Multiplicity in Decision Trees: Leaf and Structural Regret

Abstract

Decomposing Observational Multiplicity in Decision Trees: Leaf and Structural Regret

Authors

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (11)