Table of Contents
Fetching ...

StochTree: BART-based modeling in R and Python

Andrew Herren, P. Richard Hahn, Jared Murray, Carlos Carvalho

TL;DR

StochTree unifies BART-based modeling in R and Python by providing a shared C++ core that powers interoperable bindings and a broad set of extensions beyond classic BART, including BCF, random effects, heteroskedastic forests, and leafwise linear models. The paper details the BART prior, the extended feature set, and a practical three-part workflow (data preprocessing, prior specification, and algorithm settings) followed by prediction, diagnostics, and serialization; it also demonstrates the approach with a Friedman dataset example. By exposing low-level interfaces and supporting cross-language model serialization, stochtree enables rapid prototyping of novel Bayesian tree ensembles and smoother collaboration between language ecosystems. The work emphasizes extensibility and computational efficiency, aiming to bridge the gap between research innovations in BART and practical use in applied settings.

Abstract

stochtree is a C++ library for Bayesian tree ensemble models such as BART and Bayesian Causal Forests (BCF), as well as user-specified variations. Unlike previous BART packages, stochtree provides bindings to both R and Python for full interoperability. stochtree boasts a more comprehensive range of models relative to previous packages, including heteroskedastic forests, random effects, and treed linear models. Additionally, stochtree offers flexible handling of model fits: the ability to save model fits, reinitialize models from existing fits (facilitating improved model initialization heuristics), and pass fits between R and Python. On both platforms, stochtree exposes lower-level functionality, allowing users to specify models incorporating Bayesian tree ensembles without needing to modify C++ code. We illustrate the use of stochtree in three settings: i) straightfoward applications of existing models such as BART and BCF, ii) models that include more sophisticated components like heteroskedasticity and leaf-wise regression models, and iii) as a component of custom MCMC routines to fit nonstandard tree ensemble models.

StochTree: BART-based modeling in R and Python

TL;DR

StochTree unifies BART-based modeling in R and Python by providing a shared C++ core that powers interoperable bindings and a broad set of extensions beyond classic BART, including BCF, random effects, heteroskedastic forests, and leafwise linear models. The paper details the BART prior, the extended feature set, and a practical three-part workflow (data preprocessing, prior specification, and algorithm settings) followed by prediction, diagnostics, and serialization; it also demonstrates the approach with a Friedman dataset example. By exposing low-level interfaces and supporting cross-language model serialization, stochtree enables rapid prototyping of novel Bayesian tree ensembles and smoother collaboration between language ecosystems. The work emphasizes extensibility and computational efficiency, aiming to bridge the gap between research innovations in BART and practical use in applied settings.

Abstract

stochtree is a C++ library for Bayesian tree ensemble models such as BART and Bayesian Causal Forests (BCF), as well as user-specified variations. Unlike previous BART packages, stochtree provides bindings to both R and Python for full interoperability. stochtree boasts a more comprehensive range of models relative to previous packages, including heteroskedastic forests, random effects, and treed linear models. Additionally, stochtree offers flexible handling of model fits: the ability to save model fits, reinitialize models from existing fits (facilitating improved model initialization heuristics), and pass fits between R and Python. On both platforms, stochtree exposes lower-level functionality, allowing users to specify models incorporating Bayesian tree ensembles without needing to modify C++ code. We illustrate the use of stochtree in three settings: i) straightfoward applications of existing models such as BART and BCF, ii) models that include more sophisticated components like heteroskedasticity and leaf-wise regression models, and iii) as a component of custom MCMC routines to fit nonstandard tree ensemble models.

Paper Structure

This paper contains 14 sections, 7 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Example of a decision tree with five nodes and two splits.
  • Figure 2: Traceplot of the global error variance parameter, $\sigma^2$, sampled as part of a homoskedastic BART model fit with the stochtree R package.