Table of Contents
Fetching ...

Classification Trees with Valid Inference via the Exponential Mechanism

Soham Bakshi, Snigdha Panigrahi

TL;DR

A novel tree-fitting method is introduced that replaces the greedy splitting of the predictor space in standard tree algorithms with a probabilistic approach that delivers powerful inference without sacrificing predictive accuracy, in contrast to data splitting methods.

Abstract

Decision trees are widely used for non-linear modeling, as they capture interactions between predictors while producing inherently interpretable models. Despite their popularity, performing inference on the non-linear fit remains largely unaddressed. This paper focuses on classification trees and makes two key contributions. First, we introduce a novel tree-fitting method that replaces the greedy splitting of the predictor space in standard tree algorithms with a probabilistic approach. Each split in our approach is selected according to sampling probabilities defined by an exponential mechanism, with a temperature parameter controlling its deviation from the deterministic choice given data. Second, while our approach can fit a tree that with high probability coincides with the fit produced by standard tree algorithms at low temperatures, it is not merely predictive; unlike standard algorithms, it enables inference by taking into account the highly adaptive tree structure. Our method produces pivots directly from the sampling probabilities in the exponential mechanism. In theory, our pivots allow asymptotically valid inference on the parameters in the predictive fit, and in practice, our method delivers powerful inference without sacrificing predictive accuracy, in contrast to data splitting methods.

Classification Trees with Valid Inference via the Exponential Mechanism

TL;DR

A novel tree-fitting method is introduced that replaces the greedy splitting of the predictor space in standard tree algorithms with a probabilistic approach that delivers powerful inference without sacrificing predictive accuracy, in contrast to data splitting methods.

Abstract

Decision trees are widely used for non-linear modeling, as they capture interactions between predictors while producing inherently interpretable models. Despite their popularity, performing inference on the non-linear fit remains largely unaddressed. This paper focuses on classification trees and makes two key contributions. First, we introduce a novel tree-fitting method that replaces the greedy splitting of the predictor space in standard tree algorithms with a probabilistic approach. Each split in our approach is selected according to sampling probabilities defined by an exponential mechanism, with a temperature parameter controlling its deviation from the deterministic choice given data. Second, while our approach can fit a tree that with high probability coincides with the fit produced by standard tree algorithms at low temperatures, it is not merely predictive; unlike standard algorithms, it enables inference by taking into account the highly adaptive tree structure. Our method produces pivots directly from the sampling probabilities in the exponential mechanism. In theory, our pivots allow asymptotically valid inference on the parameters in the predictive fit, and in practice, our method delivers powerful inference without sacrificing predictive accuracy, in contrast to data splitting methods.

Paper Structure

This paper contains 47 sections, 23 theorems, 171 equations, 5 figures, 3 tables, 2 algorithms.

Key Result

Proposition 3.1

For any split $s \in \mathcal{K}({\mathcal{P}})$ on an interior region ${\mathcal{P}}$, where $s\notin \underset{s'\in \mathcal{K}({\mathcal{P}})}{\text{argmax}} \; G_{{\mathcal{P}}}(s'; y)$, it holds that

Figures (5)

  • Figure 1: Illustration of inference for decision trees. Left: the algorithmic modeling approach described in breiman2001statistical, in which a decision tree is used as a predictive model to explain the black box linking predictors $X$ to outcome $y$; Middle: a classification tree generated by our tree-fitting algorithm using the proposed exponential mechanism, with leaf nodes reporting predictions based on a majority vote; Right: the same tree fit, now equipped with inference (in particular, confidence intervals) on the mean parameters of subpopulations identified by the rule-based partition of the classification tree.
  • Figure 2: Effect of overall temperature on coverage, interval length, and held-out log-loss. RCT(1), RCT(2), RCT(3) represent RCT with $\tau = 10,15,20$ respectively.
  • Figure 3: Comparison with data splitting at varying inference fraction s: coverage, interval length, and held-out log-loss
  • Figure 4: Effect of signal strength on CI quality and predictive performance.
  • Figure 5: Test-set accuracy of the ensemble RCT and the standard random forest across Monte Carlo replicates for varying ensemble sizes $B$. Red triangles denote Monte Carlo means.

Theorems & Definitions (53)

  • Proposition 3.1
  • Proposition 3.2
  • Remark 1
  • Remark 2
  • Definition 4.1
  • Definition 4.2
  • Definition 4.3
  • Lemma 4.1
  • Definition 4.4
  • Lemma 4.2
  • ...and 43 more