Joint Optimization of Piecewise Linear Ensembles

Matt Raymond; Angela Violi; Clayton Scott

Joint Optimization of Piecewise Linear Ensembles

Matt Raymond, Angela Violi, Clayton Scott

TL;DR

JOPLEn addresses the limitations of traditional tree ensembles by jointly optimizing linear leaf models across fixed partitions, enabling nonlinear predictions with structured penalties. By incorporating penalties such as the $\ell_{2,1}$ group norm, nuclear norm, Dirty LASSO, and Laplacian regularization, the approach fosters sparsity, subspace alignment, and smoothness, while remaining compatible with convex losses and accelerated proximal optimization. Empirical results across 153 datasets show JOPLEn frequently surpasses gradient boosting, random forests, and other enhancement methods, with notable gains in multitask feature selection and interpretability due to sparser solutions. The framework’s flexibility and public GPU-accelerated implementation suggest practical impact for tabular data tasks where nonlinear but interpretable models with principled regularization are desirable.

Abstract

Tree ensembles achieve state-of-the-art performance on numerous prediction tasks. We propose $\textbf{J}$oint $\textbf{O}$ptimization of $\textbf{P}$iecewise $\textbf{L}$inear $\textbf{En}$sembles (JOPLEn), which jointly fits piecewise linear models at all leaf nodes of an existing tree ensemble. In addition to enhancing the ensemble expressiveness, JOPLEn allows several common penalties, including sparsity-promoting and subspace-norms, to be applied to nonlinear prediction. For example, JOPLEn with a nuclear norm penalty learns subspace-aligned functions. Additionally, JOPLEn (combined with a Dirty LASSO penalty) is an effective feature selection method for nonlinear prediction in multitask learning. Finally, we demonstrate the performance of JOPLEn on 153 regression and classification datasets and with a variety of penalties. JOPLEn leads to improved prediction performance relative to not only standard random forest and boosted tree ensembles, but also other methods for enhancing tree ensembles.

Joint Optimization of Piecewise Linear Ensembles

TL;DR

group norm, nuclear norm, Dirty LASSO, and Laplacian regularization, the approach fosters sparsity, subspace alignment, and smoothness, while remaining compatible with convex losses and accelerated proximal optimization. Empirical results across 153 datasets show JOPLEn frequently surpasses gradient boosting, random forests, and other enhancement methods, with notable gains in multitask feature selection and interpretability due to sparser solutions. The framework’s flexibility and public GPU-accelerated implementation suggest practical impact for tabular data tasks where nonlinear but interpretable models with principled regularization are desirable.

Abstract

Tree ensembles achieve state-of-the-art performance on numerous prediction tasks. We propose

oint

ptimization of

iecewise

inear

sembles (JOPLEn), which jointly fits piecewise linear models at all leaf nodes of an existing tree ensemble. In addition to enhancing the ensemble expressiveness, JOPLEn allows several common penalties, including sparsity-promoting and subspace-norms, to be applied to nonlinear prediction. For example, JOPLEn with a nuclear norm penalty learns subspace-aligned functions. Additionally, JOPLEn (combined with a Dirty LASSO penalty) is an effective feature selection method for nonlinear prediction in multitask learning. Finally, we demonstrate the performance of JOPLEn on 153 regression and classification datasets and with a variety of penalties. JOPLEn leads to improved prediction performance relative to not only standard random forest and boosted tree ensembles, but also other methods for enhancing tree ensembles.

Paper Structure (14 sections, 8 equations, 4 figures)

This paper contains 14 sections, 8 equations, 4 figures.

Introduction
Methodology
Single-task JOPLEn
Single-task $\ell_{p,1}$-norm
Single-task nuclear norm
Multitask JOPLEn
Dirty LASSO
Laplacian regularization
Experiments
Single-task regression and binary classification
Single-task nuclear norm
Multitask Dirty LASSO
Conclusions
Code availability

Figures (4)

Figure 1: Each point is one PMLB data set. Similar models are grouped by color. "$\mathcal{L}$" and "F" indicate Laplacian + Frobenius (F) norm and F norm regularization. "NC" indicates CatBoost without categorical features. a) shows the normalized MSE on regression datasets (truncated at 1.5). The dotted line indicates naïve performance. Right-hand $p$-values compare the refitting method and with the original ensemble. b) shows the 0/1 loss for classification datasets. For both plots, the black line indicates the median performance over all datasets.
Figure 2: a) and b) show the training and testing sets of a function that lies along a feature subspace. c) shows linear JOPLEn's prediction using a Frobenius norm penalty, and d) shows the prediction using the nuclear norm penalty. The mean squared error (MSE) is reported below each method.
Figure 3: a) demonstrates that JOPLEn DL learns common and task-specific features from the SARCOS dataset. The $x$-axis represents cells grouped by task, and the $y$-axis indicates the associated input feature. Blue indicates negative weights, red indicates positive weights, white is zeros. b) shows regression performance, with equal performance on the diagonal. JOPLEn's performance is equal to or greater than that of DL (orange) on most tasks, and similar to that of BoUTS (teal).
Figure 4: The number of common and task-specific features selected by DL, BoUTS, and JOPLEn DL for the a) SARCOS and b) NanoChem datasets (fewer is better).

Joint Optimization of Piecewise Linear Ensembles

TL;DR

Abstract

Joint Optimization of Piecewise Linear Ensembles

Authors

TL;DR

Abstract

Table of Contents

Figures (4)