Table of Contents
Fetching ...

Policy-Oriented Binary Classification: Improving (KD-)CART Final Splits for Subpopulation Targeting

Lei Bill Wang, Zhenbang Jiao, Fangyi Wang

TL;DR

This paper defines Latent Probability Classification (LPC) and shows that standard CART and KD-CART splits are suboptimal for policy targeting tasks. It introduces Maximizing Distance Final Split (MDFS), along with Penalized Final Split (PFS) and weighted Empirical risk Final Split (wEFS), to produce splits that strictly dominate CART under appropriate assumptions, with MDFS providing a consistent estimator of the unique optimal split $s^*$ defined by $\eta(s^*)=c$. The methods are extended to knowledge-distillation settings and evaluated on synthetic and real-world datasets, where MDFS/PFS/wEFS consistently outperform CART and KD-CART, with RF-MDFS often delivering the strongest gains. The results have practical implications for targeting vulnerable subpopulations under fixed resource constraints, and the framework accommodates general thresholds $c$ as well as potential extensions to more complex, multi-node scenarios.

Abstract

Policymakers often use recursive binary split rules to partition populations based on binary outcomes and target subpopulations whose probability of the binary event exceeds a threshold. We call such problems Latent Probability Classification (LPC). Practitioners typically employ Classification and Regression Trees (CART) for LPC. We prove that in the context of LPC, classic CART and the knowledge distillation method, whose student model is a CART (referred to as KD-CART), are suboptimal. We propose Maximizing Distance Final Split (MDFS), which generates split rules that strictly dominate CART/KD-CART under the unique intersect assumption. MDFS identifies the unique best split rule, is consistent, and targets more vulnerable subpopulations than CART/KD-CART. To relax the unique intersect assumption, we additionally propose Penalized Final Split (PFS) and weighted Empirical risk Final Split (wEFS). Through extensive simulation studies, we demonstrate that the proposed methods predominantly outperform CART/KD-CART. When applied to real-world datasets, MDFS generates policies that target more vulnerable subpopulations than the CART/KD-CART.

Policy-Oriented Binary Classification: Improving (KD-)CART Final Splits for Subpopulation Targeting

TL;DR

This paper defines Latent Probability Classification (LPC) and shows that standard CART and KD-CART splits are suboptimal for policy targeting tasks. It introduces Maximizing Distance Final Split (MDFS), along with Penalized Final Split (PFS) and weighted Empirical risk Final Split (wEFS), to produce splits that strictly dominate CART under appropriate assumptions, with MDFS providing a consistent estimator of the unique optimal split defined by . The methods are extended to knowledge-distillation settings and evaluated on synthetic and real-world datasets, where MDFS/PFS/wEFS consistently outperform CART and KD-CART, with RF-MDFS often delivering the strongest gains. The results have practical implications for targeting vulnerable subpopulations under fixed resource constraints, and the framework accommodates general thresholds as well as potential extensions to more complex, multi-node scenarios.

Abstract

Policymakers often use recursive binary split rules to partition populations based on binary outcomes and target subpopulations whose probability of the binary event exceeds a threshold. We call such problems Latent Probability Classification (LPC). Practitioners typically employ Classification and Regression Trees (CART) for LPC. We prove that in the context of LPC, classic CART and the knowledge distillation method, whose student model is a CART (referred to as KD-CART), are suboptimal. We propose Maximizing Distance Final Split (MDFS), which generates split rules that strictly dominate CART/KD-CART under the unique intersect assumption. MDFS identifies the unique best split rule, is consistent, and targets more vulnerable subpopulations than CART/KD-CART. To relax the unique intersect assumption, we additionally propose Penalized Final Split (PFS) and weighted Empirical risk Final Split (wEFS). Through extensive simulation studies, we demonstrate that the proposed methods predominantly outperform CART/KD-CART. When applied to real-world datasets, MDFS generates policies that target more vulnerable subpopulations than the CART/KD-CART.

Paper Structure

This paper contains 30 sections, 10 theorems, 51 equations, 11 figures, 2 tables, 4 algorithms.

Key Result

Theorem 3.2

Suppose $c\in [c_{min}, c_{max}]$ where $c_{min} = \min(\mu_L(s^{CART}), \mu_R(s^{CART}))$ and $c_{max} = \max(\mu_L(s^{CART}) , \mu_R(s^{CART}))$ and $\eta(s^{CART}) \neq c$. Then there exists Further, all $s \in \left((s^* \land s^{CART}), (s^* \lor s^{CART})\right)$, strictly dominates$s^{CART}$.

Figures (11)

  • Figure 1: Illustrative examples comparing $s^{CART}$ with $s^*$
  • Figure 2: Boxplots of MR differences relative to baseline models. For each panel, the first three boxplots compare CART with MDFS, PFS and wEFS, and the last boxplot compares RF-CART with RF-MDFS. Negative values in the boxplots denote improvement in MR. Embedded tables list one-sided paired t-test and Wilcoxon signed-rank p-values for mean and median MR differences, respectively. See the boxplot for F1 scores in Appendix \ref{['Appendix: simulation result']}.
  • Figure 3: The targeting policies generated by CART, MDFS, RF-CART, RF-MDFS. The red groups are the targeted subpopulations predicted to a higher than 60% probability of being diabetic. We present nodes that differ by targeting decisions due to page limit, see the full trees in Appendix \ref{['Appendix Empirical Studies']}.
  • Figure 4: Visualization of the different loss components of PFS.
  • Figure 5: Grow_Tree
  • ...and 6 more figures

Theorems & Definitions (22)

  • Definition 3.1: Strict dominance
  • Theorem 3.2
  • Theorem 4.2
  • Theorem 4.3
  • Remark 4.4
  • Remark 4.5
  • Theorem 5.1
  • Remark 7.1
  • Corollary 7.2
  • Lemma B.1
  • ...and 12 more