Scalable Higher-Order Tensor Product Spline Models
David Rügamer
TL;DR
This work introduces Additive Higher-Order Factorization Machines (AHOFMs) to scale higher-order tensor product splines within generalized additive models. By representing multivariate smooths through univariate spline bases with latent factors, AHOFMs achieve computational costs that scale approximately linearly with the number of features $p$, while capturing interactions up to degree $D$ via AHOTs. A decomposable smoothness penalty linked to degrees-of-freedom guides homogeneous smoothing across terms, and optimization employs stochastic gradient descent with a tractable block-coordinate approach. Empirical results show AHOFMs approximate TPS surfaces well, deliver competitive predictive performance, and exhibit favorable scalability compared to full TPS-based GAMs, with sparsification identified as a key future enhancement for interpretability and efficiency.
Abstract
In the current era of vast data and transparent machine learning, it is essential for techniques to operate at a large scale while providing a clear mathematical comprehension of the internal workings of the method. Although there already exist interpretable semi-parametric regression methods for large-scale applications that take into account non-linearity in the data, the complexity of the models is still often limited. One of the main challenges is the absence of interactions in these models, which are left out for the sake of better interpretability but also due to impractical computational costs. To overcome this limitation, we propose a new approach using a factorization method to derive a highly scalable higher-order tensor product spline model. Our method allows for the incorporation of all (higher-order) interactions of non-linear feature effects while having computational costs proportional to a model without interactions. We further develop a meaningful penalization scheme and examine the induced optimization problem. We conclude by evaluating the predictive and estimation performance of our method.
