Table of Contents
Fetching ...

Decomposing Observational Multiplicity in Decision Trees: Leaf and Structural Regret

Mustafa Cavus

Abstract

Many machine learning tasks admit multiple models that perform almost equally well, a phenomenon known as predictive multiplicity. A fundamental source of this multiplicity is observational multiplicity, which arises from the stochastic nature of label collection: observed training labels represent only a single realization of the underlying ground-truth probabilities. While theoretical frameworks for observational multiplicity have been established for logistic regression, their implications for non-smooth, partition-based models like decision trees remain underexplored. In this paper, we introduce two complementary notions of observational multiplicity for decision tree classifiers: leaf regret and structural regret. Leaf regret quantifies the intrinsic variability of predictions within a fixed leaf due to finite-sample noise, while structural regret captures variability induced by the instability of the learned tree structure itself. We provide a formal decomposition of observational multiplicity into these two components and establish statistical guarantees. Our experimental evaluation across diverse credit risk scoring datasets confirms the near-perfect alignment between our theoretical decomposition and the empirically observed variance. Notably, we find that structural regret is the primary driver of observational multiplicity, accounting for over 15 times the variability of leaf regret in some datasets. Furthermore, we demonstrate that utilizing these regret measures as an abstention mechanism in selective prediction can effectively identify arbitrary regions and improve model safety, elevating recall from 92% to 100% on the most stable sub-populations. These results establish a rigorous framework for quantifying observational multiplicity, aligning with recent advances in algorithmic safety and interpretability.

Decomposing Observational Multiplicity in Decision Trees: Leaf and Structural Regret

Abstract

Many machine learning tasks admit multiple models that perform almost equally well, a phenomenon known as predictive multiplicity. A fundamental source of this multiplicity is observational multiplicity, which arises from the stochastic nature of label collection: observed training labels represent only a single realization of the underlying ground-truth probabilities. While theoretical frameworks for observational multiplicity have been established for logistic regression, their implications for non-smooth, partition-based models like decision trees remain underexplored. In this paper, we introduce two complementary notions of observational multiplicity for decision tree classifiers: leaf regret and structural regret. Leaf regret quantifies the intrinsic variability of predictions within a fixed leaf due to finite-sample noise, while structural regret captures variability induced by the instability of the learned tree structure itself. We provide a formal decomposition of observational multiplicity into these two components and establish statistical guarantees. Our experimental evaluation across diverse credit risk scoring datasets confirms the near-perfect alignment between our theoretical decomposition and the empirically observed variance. Notably, we find that structural regret is the primary driver of observational multiplicity, accounting for over 15 times the variability of leaf regret in some datasets. Furthermore, we demonstrate that utilizing these regret measures as an abstention mechanism in selective prediction can effectively identify arbitrary regions and improve model safety, elevating recall from 92% to 100% on the most stable sub-populations. These results establish a rigorous framework for quantifying observational multiplicity, aligning with recent advances in algorithmic safety and interpretability.
Paper Structure (17 sections, 10 theorems, 6 equations, 3 figures, 1 table, 2 algorithms)

This paper contains 17 sections, 10 theorems, 6 equations, 3 figures, 1 table, 2 algorithms.

Key Result

lemma 1

For any leaf $L$ with $n_L \ge 1$, the quantity $R_L^{\mathrm{leaf}} := \mathrm{Var}(\hat{p}_L \mid L)$ is finite and admits the closed-form expression $R_L^{\mathrm{leaf}} = \frac{p_L^\ast (1 - p_L^\ast)}{n_L}$.

Figures (3)

  • Figure 1: Theorized versus actual regret in decision trees across three datasets. The x-axis represents the sum of expected leaf regret and structural regret, while the y-axis shows the simulated true variance.
  • Figure 2: The impact of minimum leaf size ($n_L$) on predictive stability and model performance. The dual-axis plot illustrates the trade-off between Leaf Regret and Logistic Loss. These empirical results confirm that increasing partition size effectively mitigates observational multiplicity, as predicted in Lemma 2, while the increase in Logistic Loss reflects an empirical underfitting trade-off.
  • Figure 3: Selective Prediction: Recall vs. Coverage across six datasets. The x-axis is reversed, moving from full dataset utilization as 100% to selective prediction on the most stable individuals. Curves represent ranking strategies based on Leaf Regret, Structural Regret, and Total Regret.

Theorems & Definitions (11)

  • lemma 1: Well-definedness of leaf regret
  • lemma 2: Uniform upper bound
  • lemma 3: Consistency of the plug-in estimator
  • lemma 4: Deviation inequality
  • theorem 1: Asymptotic vanishing of leaf regret
  • corollary 1: Expected leaf regret bound
  • lemma 5: Consistency of the Monte Carlo estimator
  • theorem 2: Two-stage convergence of Monte Carlo leaf regret
  • definition 1: Structural regret
  • lemma 6: Decomposition of predictive variability
  • ...and 1 more