Scalable Higher-Order Tensor Product Spline Models

David Rügamer

Scalable Higher-Order Tensor Product Spline Models

David Rügamer

TL;DR

This work introduces Additive Higher-Order Factorization Machines (AHOFMs) to scale higher-order tensor product splines within generalized additive models. By representing multivariate smooths through univariate spline bases with latent factors, AHOFMs achieve computational costs that scale approximately linearly with the number of features $p$, while capturing interactions up to degree $D$ via AHOTs. A decomposable smoothness penalty linked to degrees-of-freedom guides homogeneous smoothing across terms, and optimization employs stochastic gradient descent with a tractable block-coordinate approach. Empirical results show AHOFMs approximate TPS surfaces well, deliver competitive predictive performance, and exhibit favorable scalability compared to full TPS-based GAMs, with sparsification identified as a key future enhancement for interpretability and efficiency.

Abstract

In the current era of vast data and transparent machine learning, it is essential for techniques to operate at a large scale while providing a clear mathematical comprehension of the internal workings of the method. Although there already exist interpretable semi-parametric regression methods for large-scale applications that take into account non-linearity in the data, the complexity of the models is still often limited. One of the main challenges is the absence of interactions in these models, which are left out for the sake of better interpretability but also due to impractical computational costs. To overcome this limitation, we propose a new approach using a factorization method to derive a highly scalable higher-order tensor product spline model. Our method allows for the incorporation of all (higher-order) interactions of non-linear feature effects while having computational costs proportional to a model without interactions. We further develop a meaningful penalization scheme and examine the induced optimization problem. We conclude by evaluating the predictive and estimation performance of our method.

Scalable Higher-Order Tensor Product Spline Models

TL;DR

, while capturing interactions up to degree

via AHOTs. A decomposable smoothness penalty linked to degrees-of-freedom guides homogeneous smoothing across terms, and optimization employs stochastic gradient descent with a tractable block-coordinate approach. Empirical results show AHOFMs approximate TPS surfaces well, deliver competitive predictive performance, and exhibit favorable scalability compared to full TPS-based GAMs, with sparsification identified as a key future enhancement for interpretability and efficiency.

Abstract

Paper Structure (39 sections, 8 theorems, 23 equations, 5 figures, 3 tables, 5 algorithms)

This paper contains 39 sections, 8 theorems, 23 equations, 5 figures, 3 tables, 5 algorithms.

INTRODUCTION
RELATED LITERATURE
GAMs and TPS
Factorization Approaches
Boosting
BACKGROUND
Generalized Additive Models
Smoothness Penalties
Tensor Product Splines
SCALABLE HIGHER-ORDER TENSOR PRODUCT SPLINE MODELS
Additive Factorization Machines
Additive Higher-Order Factorization Machines
Penalization and Optimization
Penalization
Scalable Smoothing
...and 24 more sections

Key Result

Lemma 1

The approximation of eq:afmapprox using eq:bigam can be written as with $\varphi_{k,f} = \sum_{m=1}^{M_k} B_{m,k}(x_k) \gamma_{m,k,f}$.

Figures (5)

Figure 1: Comparison of memory consumption (first row) and time consumption (second row) between the state-of-the-art big additive model (BAM) implementation (in red) and our proposal (in gray) when fitting a model for all $\binom{p}{2}$ tensor product splines using different numbers of features $p$ (x-axis) and observations (columns).
Figure 2: Estimated and true surfaces for 7 different TP splines (columns) and methods (rows) visualized by contour plots. Colors represent the partial effect values.
Figure 3: Estimation quality measures by the MSE difference between a GAM estimation (gold standard) and our proposal with different numbers of latent dimensions $F$ (x-axis) and different numbers of observations (columns). Points correspond to different simulation replications and surfaces. A blue smoother function visualizes the trend in $F$. For smaller data sets, however, smaller $F$ can be beneficial by inducing additional regularization.
Figure 4: Prediction error of the GAM (lower bound) and our proposal with different numbers of latent dimensions $F$ (colors) for different numbers of observations (x-axis).
Figure 5: Estimated partial effects of different features (columns) for different three-dimensional functions (rows). Blue lines indicate the marginal average in the respective feature direction while gray vertical lines show the spread across the other two dimensions. More variation indicates larger variation across the other two dimensions. Features not involved in a partial effect (diagonal from top left to bottom right) naturally have a constant effect.

Theorems & Definitions (11)

Lemma 1: AFM Representation
Proposition 1: Linear Scaling of AFMs
Proposition 2: Basis Evaluations in AFMs
Definition 1: Additive Higher-order Term (AHOT)
Definition 2: AHOFM of Degree $D$
Lemma 2: Representation AHOT of Degree $d$
Proposition 3: Linear Scaling of AHOFMs
Proposition 4: Basis Evaluations in AHOFMs
Definition 3: AHOFM Penalty
Proposition 5: Homogeneous AHOFM Smoothing
...and 1 more

Scalable Higher-Order Tensor Product Spline Models

TL;DR

Abstract

Scalable Higher-Order Tensor Product Spline Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (11)