Table of Contents
Fetching ...

SMART: A Flexible Approach to Regression using Spline-Based Multivariate Adaptive Regression Trees

William Pattie, Arvind Krishna

TL;DR

SMART addresses the twin challenges of high variance in regression trees and the inability of MARS to model discontinuities. It fuses a CART-like forward partitioning with leaf-wise MARS fitting, enabling efficient identification of discontinuities while leveraging MARS for interactions and higher-order terms. Empirical results across Friedman and synthetic piecewise datasets show SMART often outperforms state-of-the-art methods and, in some cases, matches or exceeds MARS while reducing bias. The approach offers a flexible, scalable tool for datasets exhibiting both smooth nonlinearities and abrupt changes, with an open-source implementation for practitioners and researchers.

Abstract

Decision trees are powerful for predictive modeling but often suffer from high variance when modeling continuous relationships. While algorithms like Multivariate Adaptive Regression Splines (MARS) excel at capturing such continuous relationships, they perform poorly when modeling discontinuities. To address the limitations of both approaches, we introduce Spline-based Multivariate Adaptive Regression Trees (SMART), which uses a decision tree to identify subsets of data with distinct continuous relationships and then leverages MARS to fit these relationships independently. Unlike other methods that rely on the tree structure to model interaction and higher-order terms, SMART leverages MARS's native ability to handle these terms, allowing the tree to focus solely on identifying discontinuities in the relationship. We test SMART on various datasets, demonstrating its improvement over state-of-the-art methods in such cases. Additionally, we provide an open-source implementation of our method to be used by practitioners.

SMART: A Flexible Approach to Regression using Spline-Based Multivariate Adaptive Regression Trees

TL;DR

SMART addresses the twin challenges of high variance in regression trees and the inability of MARS to model discontinuities. It fuses a CART-like forward partitioning with leaf-wise MARS fitting, enabling efficient identification of discontinuities while leveraging MARS for interactions and higher-order terms. Empirical results across Friedman and synthetic piecewise datasets show SMART often outperforms state-of-the-art methods and, in some cases, matches or exceeds MARS while reducing bias. The approach offers a flexible, scalable tool for datasets exhibiting both smooth nonlinearities and abrupt changes, with an open-source implementation for practitioners and researchers.

Abstract

Decision trees are powerful for predictive modeling but often suffer from high variance when modeling continuous relationships. While algorithms like Multivariate Adaptive Regression Splines (MARS) excel at capturing such continuous relationships, they perform poorly when modeling discontinuities. To address the limitations of both approaches, we introduce Spline-based Multivariate Adaptive Regression Trees (SMART), which uses a decision tree to identify subsets of data with distinct continuous relationships and then leverages MARS to fit these relationships independently. Unlike other methods that rely on the tree structure to model interaction and higher-order terms, SMART leverages MARS's native ability to handle these terms, allowing the tree to focus solely on identifying discontinuities in the relationship. We test SMART on various datasets, demonstrating its improvement over state-of-the-art methods in such cases. Additionally, we provide an open-source implementation of our method to be used by practitioners.
Paper Structure (16 sections, 26 equations, 7 figures, 6 tables, 4 algorithms)

This paper contains 16 sections, 26 equations, 7 figures, 6 tables, 4 algorithms.

Figures (7)

  • Figure 1: Equation \ref{['visual_equation']} mapped. The noise is not visualized.
  • Figure 2: On the left is MARS fit with a max degree of interaction of 2. On the right Random Forest is tuned using grid search and 5-fold cross-validation. MARS has an RMSE of 1.11 on the true function. Random Forest has an RMSE of 0.71 on the true function.
  • Figure 3: The result of the forward pass on Equation \ref{['visual_equation']}. The model was fit with a max degree of interaction of 2. The model currently has a RMSE of 1.12 on the true function.
  • Figure 4: The result of the tree split phase on Equation \ref{['visual_equation']}. The model's RMSE reduced from 1.12 to 0.42 on the true function.
  • Figure 5: The result of the tree split phase on Equation \ref{['visual_equation']}. The model's RMSE reduced from 0.42 to 0.32 on the true function.
  • ...and 2 more figures