Fast and interpretable Support Vector Classification based on the truncated ANOVA decomposition
Kseniya Akhalaya, Franziska Nestler, Daniel Potts
TL;DR
This work addresses high-dimensional classification by replacing kernel-based SVMs with primal, finite-dimensional SVMs using feature maps built from truncated ANOVA-friendly bases such as trigonometric functions and Chui–Wang wavelets. By restricting to low-order variable interactions (small superposition dimension) and employing grouped transformations, the method achieves polynomial rather than exponential scaling while enabling interpretability via Sobol-type global sensitivity indices. The authors formulate both $\ell_2$ and $\ell_1$ regularized objectives, solve them with gradient descent and FISTA respectively, and validate the approach on toy, synthetic, and real-world data, showing competitive accuracy and clear interpretability. The approach is implemented in Julia within the ANOVAapprox.jl ecosystem, achieves efficient matrix-vector products through NFCT and sparse wavelet representations, and provides practical insight into which features and interactions drive the classification. Overall, the paper demonstrates that truncated ANOVA-based primal SVMs can deliver accurate, interpretable classifiers for scattered data in moderate-to-high dimensions, with potential for further theoretical refinement and multiclass extensions. Mathematically, the framework relies on $\boldsymbol\Phi$ built from basis functions, a tampered ANOVA decomposition $f=\sum_{\boldsymbol u} f_{\boldsymbol u}$, and sparse solutions under $\ell_1$ or $\ell_2$ regularization, enabling efficient, interpretable learning in high dimensions.
Abstract
Support Vector Machines (SVMs) are an important tool for performing classification on scattered data, where one usually has to deal with many data points in high-dimensional spaces. We propose solving SVMs in primal form using feature maps based on trigonometric functions or wavelets. In small dimensional settings the Fast Fourier Transform (FFT) and related methods are a powerful tool in order to deal with the considered basis functions. For growing dimensions the classical FFT-based methods become inefficient due to the curse of dimensionality. Therefore, we restrict ourselves to multivariate basis functions, each of which only depends on a small number of dimensions. This is motivated by the well-known sparsity of effects and recent results regarding the reconstruction of functions from scattered data in terms of truncated analysis of variance (ANOVA) decompositions, which makes the resulting model even interpretable in terms of importance of the features as well as their couplings. The usage of small superposition dimensions has the consequence that the computational effort no longer grows exponentially but only polynomially with respect to the dimension. In order to enforce sparsity regarding the basis coefficients, we use the frequently applied $\ell_2$-norm and, in addition, $\ell_1$-norm regularization. The found classifying function, which is the linear combination of basis functions, and its variance can then be analyzed in terms of the classical ANOVA decomposition of functions. Based on numerical examples we show that we are able to recover the signum of a function that perfectly fits our model assumptions. Furthermore, we perform classification on different artificial and real-world data sets. We obtain better results with $\ell_1$-norm regularization, both in terms of accuracy and clarity of interpretability.
