Table of Contents
Fetching ...

Revisiting Chebyshev Polynomial and Anisotropic RBF Models for Tabular Regression

Luciano Gerber, Huw Lloyd

TL;DR

This work develops an anisotropic RBF network with data-driven centre placement and gradient-based width optimisation, a ridge-regularised Chebyshev polynomial regressor, and a smooth-tree hybrid (Chebyshev model tree) and recommends routinely including smooth-basis models in the candidate pool, particularly when downstream use benefits from tighter generalisation and gradually varying predictions.

Abstract

Smooth-basis models such as Chebyshev polynomial regressors and radial basis function (RBF) networks are well established in numerical analysis. Their continuously differentiable prediction surfaces suit surrogate optimisation, sensitivity analysis, and other settings where the response varies gradually with inputs. Despite these properties, smooth models seldom appear in tabular regression, where tree ensembles dominate. We ask whether they can compete, benchmarking models across 55 regression datasets organised by application domain. We develop an anisotropic RBF network with data-driven centre placement and gradient-based width optimisation, a ridge-regularised Chebyshev polynomial regressor, and a smooth-tree hybrid (Chebyshev model tree); all three are released as scikit-learn-compatible packages. We benchmark these against tree ensembles, a pre-trained transformer, and standard baselines, evaluating accuracy alongside generalisation behaviour. The transformer ranks first on accuracy across a majority of datasets, but its GPU dependence, inference latency, and dataset-size limits constrain deployment in the CPU-based settings common across applied science and industry. Among CPU-viable models, smooth models and tree ensembles are statistically tied on accuracy, but the former tend to exhibit tighter generalisation gaps. We recommend routinely including smooth-basis models in the candidate pool, particularly when downstream use benefits from tighter generalisation and gradually varying predictions.

Revisiting Chebyshev Polynomial and Anisotropic RBF Models for Tabular Regression

TL;DR

This work develops an anisotropic RBF network with data-driven centre placement and gradient-based width optimisation, a ridge-regularised Chebyshev polynomial regressor, and a smooth-tree hybrid (Chebyshev model tree) and recommends routinely including smooth-basis models in the candidate pool, particularly when downstream use benefits from tighter generalisation and gradually varying predictions.

Abstract

Smooth-basis models such as Chebyshev polynomial regressors and radial basis function (RBF) networks are well established in numerical analysis. Their continuously differentiable prediction surfaces suit surrogate optimisation, sensitivity analysis, and other settings where the response varies gradually with inputs. Despite these properties, smooth models seldom appear in tabular regression, where tree ensembles dominate. We ask whether they can compete, benchmarking models across 55 regression datasets organised by application domain. We develop an anisotropic RBF network with data-driven centre placement and gradient-based width optimisation, a ridge-regularised Chebyshev polynomial regressor, and a smooth-tree hybrid (Chebyshev model tree); all three are released as scikit-learn-compatible packages. We benchmark these against tree ensembles, a pre-trained transformer, and standard baselines, evaluating accuracy alongside generalisation behaviour. The transformer ranks first on accuracy across a majority of datasets, but its GPU dependence, inference latency, and dataset-size limits constrain deployment in the CPU-based settings common across applied science and industry. Among CPU-viable models, smooth models and tree ensembles are statistically tied on accuracy, but the former tend to exhibit tighter generalisation gaps. We recommend routinely including smooth-basis models in the candidate pool, particularly when downstream use benefits from tighter generalisation and gradually varying predictions.
Paper Structure (72 sections, 5 equations, 4 figures, 11 tables, 1 algorithm)

This paper contains 72 sections, 5 equations, 4 figures, 11 tables, 1 algorithm.

Figures (4)

  • Figure 1: Mean rank for predictive accuracy ($\bar{R}^2$); parentheses show rank-1 wins / rank-2 places. Bars show the interquartile range of per-dataset ranks; CD bars indicate the Nemenyi critical difference (\ref{['sec:benchmark-design']}): models whose mean ranks differ by less than CD are statistically indistinguishable. Full numerical results in \ref{['tab:accuracy-cpu', 'tab:accuracy-all']} (appendix).
  • Figure 2: Distribution of $\bar{R}^2$ across 55 datasets per model (ordered by mean rank). Notched boxes show median and IQR; dots are individual datasets. The competitive cluster (ranks 2--6) has largely overlapping distributions, with the main separation occurring in the lower tail.
  • Figure 3: Mean rank for generalisation gap ($R^2_{\mathrm{train}} - R^2_{\mathrm{test}}$); parentheses show rank-1 wins / rank-2 places; bars show the interquartile range. Lower rank = smaller gap. Smooth models (chebypoly, erbf) and the hybrid chebytree cluster at the top; xgb ranks last among competitive models. ridge's top position reflects underfitting.
  • Figure 4: Pareto trade-off diagrams. Each point represents one model at its mean $\bar{R}^2$ plotted against (a) mean generalisation gap and (b) mean total time per dataset including tuning (log scale). Step lines trace the Pareto front; arrows on axes indicate the direction of improvement.