Table of Contents
Fetching ...

Fitting sparse high-dimensional varying-coefficient models with Bayesian regression tree ensembles

Soham Ghosh, Saloni Bhogale, Sameer K. Deshpande

TL;DR

SparseVCBART is proposed, a fully Bayesian model that approximates each coefficient function in a VCM with a regression tree ensemble and encourages sparsity with a global--local shrinkage prior on the regression tree leaf outputs and a hierarchical prior on the splitting probabilities of each tree.

Abstract

By allowing the effects of $p$ covariates in a linear regression model to vary as functions of $R$ additional effect modifiers, varying-coefficient models (VCMs) strike a compelling balance between interpretable-but-rigid parametric models popular in classical statistics and flexible-but-opaque methods popular in machine learning. But in high-dimensional settings where $p$ and/or $R$ exceed the number of observations, existing approaches to fitting VCMs fail to identify which covariates have a non-zero effect and which effect modifiers drive these effects. We propose sparseVCBART, a fully Bayesian model that approximates each coefficient function in a VCM with a regression tree ensemble and encourages sparsity with a global--local shrinkage prior on the regression tree leaf outputs and a hierarchical prior on the splitting probabilities of each tree. We show that the sparseVCBART posterior contracts at a near-minimax optimal rate, automatically adapting to the unknown sparsity structure and smoothness of the true coefficient functions. Compared to existing state-of-the-art methods, sparseVCBART achieved competitive predictive accuracy and substantially narrower and better-calibrated uncertainty intervals, especially for null covariate effects. We use sparseVCBART to investigate how the effects of interpersonal conversations on prejudice could vary according to the political and demographic characteristics of the respondents.

Fitting sparse high-dimensional varying-coefficient models with Bayesian regression tree ensembles

TL;DR

SparseVCBART is proposed, a fully Bayesian model that approximates each coefficient function in a VCM with a regression tree ensemble and encourages sparsity with a global--local shrinkage prior on the regression tree leaf outputs and a hierarchical prior on the splitting probabilities of each tree.

Abstract

By allowing the effects of covariates in a linear regression model to vary as functions of additional effect modifiers, varying-coefficient models (VCMs) strike a compelling balance between interpretable-but-rigid parametric models popular in classical statistics and flexible-but-opaque methods popular in machine learning. But in high-dimensional settings where and/or exceed the number of observations, existing approaches to fitting VCMs fail to identify which covariates have a non-zero effect and which effect modifiers drive these effects. We propose sparseVCBART, a fully Bayesian model that approximates each coefficient function in a VCM with a regression tree ensemble and encourages sparsity with a global--local shrinkage prior on the regression tree leaf outputs and a hierarchical prior on the splitting probabilities of each tree. We show that the sparseVCBART posterior contracts at a near-minimax optimal rate, automatically adapting to the unknown sparsity structure and smoothness of the true coefficient functions. Compared to existing state-of-the-art methods, sparseVCBART achieved competitive predictive accuracy and substantially narrower and better-calibrated uncertainty intervals, especially for null covariate effects. We use sparseVCBART to investigate how the effects of interpersonal conversations on prejudice could vary according to the political and demographic characteristics of the respondents.

Paper Structure

This paper contains 38 sections, 12 theorems, 257 equations, 5 figures, 1 table.

Key Result

Theorem 1

Under (A1)-(A6) and (P1)-(P3), there exists $C>0$ such that as $N \rightarrow \infty$,

Figures (5)

  • Figure 1: Experiment 1 (top row) and Experiment 2 (bottom row). (a,c) Average MSE for evaluations $\beta_j(\bm z)$; (b,d) average $95\%$ coverage for $\beta_j(\bm z)$.
  • Figure 2: Function recovery in Experiment 2 for $\beta_0$, $\beta_1$, $\beta_2$, and a zero function $(\beta_4)$.
  • Figure S3.1: Predictive performance of all the methods averaged over $25$ replications
  • Figure S3.2: Posterior medians of local scales $\lambda_j$ from sparseVCBART. Treatment predictors (treat_pg, treat_pg_apt) show much larger medians than all 18 noise predictors, indicating strong separation between signal and noise.
  • Figure S3.3: Fit-the-fit CART for the posterior-mean surface $\hat{\beta}_{\texttt{treat\_pg}}(\bm{z})$. The root split is on support for K--12 access for undocumented children; downstream splits involve the current-economy rating and SDO terciles. Leaves summarize subgroup-average effects, yielding transparent, policy-relevant heterogeneity.

Theorems & Definitions (25)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Lemma S1.1
  • proof
  • Lemma S1.2
  • Lemma S1.3: Prior mass of the dimension–adaptive target tree
  • proof
  • proof : Proof of \ref{['lem:smallball-main']}
  • ...and 15 more