Table of Contents
Fetching ...

SplitWise Regression: Stepwise Modeling with Adaptive Dummy Encoding

Marcell T. Kurbucz, Nikolaos Tzivanakis, Nilufer Sari Aslam, Adam M. Sykulski

TL;DR

SplitWise addresses the challenge of modeling nonlinear effects without compromising interpretability by adding adaptive threshold-based dummy encodings to a stepwise framework. It automatically transforms numeric predictors into binary indicators using shallow threshold trees, retaining a transformation only if it lowers $AIC$ or $BIC$, thus preserving a globally linear model augmented by a few expressive features. The method offers two transformation modes—iterative and univariate—and is implemented in an R package with minimal dependencies, demonstrated to yield parsimonious and generalizable models across synthetic and real-world datasets. This work provides a practical, interpretable tool for settings requiring transparent yet flexible regression models, accompanied by reproducible software and benchmarking resources.

Abstract

Capturing nonlinear relationships without sacrificing interpretability remains a persistent challenge in regression modeling. We introduce SplitWise, a novel framework that enhances stepwise regression. It adaptively transforms numeric predictors into threshold-based binary features using shallow decision trees, but only when such transformations improve model fit, as assessed by the Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC). This approach preserves the transparency of linear models while flexibly capturing nonlinear effects. Implemented as a user-friendly R package, SplitWise is evaluated on both synthetic and real-world datasets. The results show that it consistently produces more parsimonious and generalizable models than traditional stepwise and penalized regression techniques.

SplitWise Regression: Stepwise Modeling with Adaptive Dummy Encoding

TL;DR

SplitWise addresses the challenge of modeling nonlinear effects without compromising interpretability by adding adaptive threshold-based dummy encodings to a stepwise framework. It automatically transforms numeric predictors into binary indicators using shallow threshold trees, retaining a transformation only if it lowers or , thus preserving a globally linear model augmented by a few expressive features. The method offers two transformation modes—iterative and univariate—and is implemented in an R package with minimal dependencies, demonstrated to yield parsimonious and generalizable models across synthetic and real-world datasets. This work provides a practical, interpretable tool for settings requiring transparent yet flexible regression models, accompanied by reproducible software and benchmarking resources.

Abstract

Capturing nonlinear relationships without sacrificing interpretability remains a persistent challenge in regression modeling. We introduce SplitWise, a novel framework that enhances stepwise regression. It adaptively transforms numeric predictors into threshold-based binary features using shallow decision trees, but only when such transformations improve model fit, as assessed by the Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC). This approach preserves the transparency of linear models while flexibly capturing nonlinear effects. Implemented as a user-friendly R package, SplitWise is evaluated on both synthetic and real-world datasets. The results show that it consistently produces more parsimonious and generalizable models than traditional stepwise and penalized regression techniques.

Paper Structure

This paper contains 9 sections, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Workflow diagram of the SplitWise regression algorithm, illustrating the two transformation modes (iterative vs. univariate) and the sequence of steps for each.