Table of Contents
Fetching ...

Fast and interpretable Support Vector Classification based on the truncated ANOVA decomposition

Kseniya Akhalaya, Franziska Nestler, Daniel Potts

TL;DR

This work addresses high-dimensional classification by replacing kernel-based SVMs with primal, finite-dimensional SVMs using feature maps built from truncated ANOVA-friendly bases such as trigonometric functions and Chui–Wang wavelets. By restricting to low-order variable interactions (small superposition dimension) and employing grouped transformations, the method achieves polynomial rather than exponential scaling while enabling interpretability via Sobol-type global sensitivity indices. The authors formulate both $\ell_2$ and $\ell_1$ regularized objectives, solve them with gradient descent and FISTA respectively, and validate the approach on toy, synthetic, and real-world data, showing competitive accuracy and clear interpretability. The approach is implemented in Julia within the ANOVAapprox.jl ecosystem, achieves efficient matrix-vector products through NFCT and sparse wavelet representations, and provides practical insight into which features and interactions drive the classification. Overall, the paper demonstrates that truncated ANOVA-based primal SVMs can deliver accurate, interpretable classifiers for scattered data in moderate-to-high dimensions, with potential for further theoretical refinement and multiclass extensions. Mathematically, the framework relies on $\boldsymbol\Phi$ built from basis functions, a tampered ANOVA decomposition $f=\sum_{\boldsymbol u} f_{\boldsymbol u}$, and sparse solutions under $\ell_1$ or $\ell_2$ regularization, enabling efficient, interpretable learning in high dimensions.

Abstract

Support Vector Machines (SVMs) are an important tool for performing classification on scattered data, where one usually has to deal with many data points in high-dimensional spaces. We propose solving SVMs in primal form using feature maps based on trigonometric functions or wavelets. In small dimensional settings the Fast Fourier Transform (FFT) and related methods are a powerful tool in order to deal with the considered basis functions. For growing dimensions the classical FFT-based methods become inefficient due to the curse of dimensionality. Therefore, we restrict ourselves to multivariate basis functions, each of which only depends on a small number of dimensions. This is motivated by the well-known sparsity of effects and recent results regarding the reconstruction of functions from scattered data in terms of truncated analysis of variance (ANOVA) decompositions, which makes the resulting model even interpretable in terms of importance of the features as well as their couplings. The usage of small superposition dimensions has the consequence that the computational effort no longer grows exponentially but only polynomially with respect to the dimension. In order to enforce sparsity regarding the basis coefficients, we use the frequently applied $\ell_2$-norm and, in addition, $\ell_1$-norm regularization. The found classifying function, which is the linear combination of basis functions, and its variance can then be analyzed in terms of the classical ANOVA decomposition of functions. Based on numerical examples we show that we are able to recover the signum of a function that perfectly fits our model assumptions. Furthermore, we perform classification on different artificial and real-world data sets. We obtain better results with $\ell_1$-norm regularization, both in terms of accuracy and clarity of interpretability.

Fast and interpretable Support Vector Classification based on the truncated ANOVA decomposition

TL;DR

This work addresses high-dimensional classification by replacing kernel-based SVMs with primal, finite-dimensional SVMs using feature maps built from truncated ANOVA-friendly bases such as trigonometric functions and Chui–Wang wavelets. By restricting to low-order variable interactions (small superposition dimension) and employing grouped transformations, the method achieves polynomial rather than exponential scaling while enabling interpretability via Sobol-type global sensitivity indices. The authors formulate both and regularized objectives, solve them with gradient descent and FISTA respectively, and validate the approach on toy, synthetic, and real-world data, showing competitive accuracy and clear interpretability. The approach is implemented in Julia within the ANOVAapprox.jl ecosystem, achieves efficient matrix-vector products through NFCT and sparse wavelet representations, and provides practical insight into which features and interactions drive the classification. Overall, the paper demonstrates that truncated ANOVA-based primal SVMs can deliver accurate, interpretable classifiers for scattered data in moderate-to-high dimensions, with potential for further theoretical refinement and multiclass extensions. Mathematically, the framework relies on built from basis functions, a tampered ANOVA decomposition , and sparse solutions under or regularization, enabling efficient, interpretable learning in high dimensions.

Abstract

Support Vector Machines (SVMs) are an important tool for performing classification on scattered data, where one usually has to deal with many data points in high-dimensional spaces. We propose solving SVMs in primal form using feature maps based on trigonometric functions or wavelets. In small dimensional settings the Fast Fourier Transform (FFT) and related methods are a powerful tool in order to deal with the considered basis functions. For growing dimensions the classical FFT-based methods become inefficient due to the curse of dimensionality. Therefore, we restrict ourselves to multivariate basis functions, each of which only depends on a small number of dimensions. This is motivated by the well-known sparsity of effects and recent results regarding the reconstruction of functions from scattered data in terms of truncated analysis of variance (ANOVA) decompositions, which makes the resulting model even interpretable in terms of importance of the features as well as their couplings. The usage of small superposition dimensions has the consequence that the computational effort no longer grows exponentially but only polynomially with respect to the dimension. In order to enforce sparsity regarding the basis coefficients, we use the frequently applied -norm and, in addition, -norm regularization. The found classifying function, which is the linear combination of basis functions, and its variance can then be analyzed in terms of the classical ANOVA decomposition of functions. Based on numerical examples we show that we are able to recover the signum of a function that perfectly fits our model assumptions. Furthermore, we perform classification on different artificial and real-world data sets. We obtain better results with -norm regularization, both in terms of accuracy and clarity of interpretability.
Paper Structure (22 sections, 84 equations, 18 figures, 4 tables, 3 algorithms)

This paper contains 22 sections, 84 equations, 18 figures, 4 tables, 3 algorithms.

Figures (18)

  • Figure 2.1: Illustration of $2$-dimensional indices $\bm k=(k_1,k_2)$, represented by the gray squares, where $\bm j=(j_1,j_2)\in \mathcal{J}_{(3,3)}$ and $\bm k\in\mathcal{K}_{\bm j}$ for each $\bm j$. This graphic is taken from lippert.
  • Figure 2.2: Cardinality of the index set $\mathcal{I}$ depending on the dimension $d$ for different superposition dimensions $d_s\in\{1,2,3,d\}$. We consider the cosine basis approach, for which we restrict the space of possible multi-frequencies $\bm k\in\mathbb N_0^d$ to $\{0,1,2,3\}^d$.
  • Figure 4.1: The sign of the univariate test functions and the training data points $(x_j,y_j)$, $j=1,2,\dots,M$ are visualized in black. The obtained classifying functions are depicted in orange ($\ell_2$-norm regularization) and in blue ($\ell_1$-norm regularization). The regularization parameter is set to $\lambda=0.01$. The classifying function $S(\mathcal{X},\mathcal{I}_{\bm N})f(\bm x)$ with $\hat{\bm f}=\hat{\bm f}^{\cos}$ is visualized in (a) and $S(\mathcal{X},\mathcal{J}_{\bm N})f(\bm x)$ with $\hat{\bm f}=\hat{\bm f}^{{\text{chui}}}$ in (b).
  • Figure 4.2: Average of mean CA over $100$ runs using $\ell_1$-norm regularization (blue) and $\ell_2$-norm regularization (orange) for the six-dimensional test functions. In (a) we visualize the results achieved by the classifying functions $S(\mathcal{X},\mathcal{I}_{\bm N}(U_2))f^{\cos}(\bm x)$ with $\hat{\bm f}=\hat{\bm f}^{\cos}$, where the number of generated training and test data points have been set to $1000$. In (b), we see the results achieved by the classifying functions $S(\mathcal{X},\mathcal{J}_{\bm N}(U_2))f(\bm x)$ with $\hat{\bm f}=\hat{\bm f}^\text{chui}$ and $5000$ generated training and test data points.
  • Figure 4.3: The global sensitivity indices $\varrho({\bm u}, S(\mathcal{X},\mathcal{I}_{\bm N}(U_2))f^\text{cos})$ with $\hat{\bm f}=\hat{\bm f}^{\cos}$ are visualized in (a) and $\varrho({\bm u}, S(\mathcal{X},\mathcal{J}_{\bm N}(U_2)f^{{\text{chui}}})$ with $\hat{\bm f}=\hat{\bm f}^{{\text{chui}}}$ in (b), for the six-dimensional test function. The results using $\ell_1$-norm regularization are shown in blue and for $\ell_2$-norm regularization in orange for ${\bm u}\in U_2$, where $|U_2|=21$.
  • ...and 13 more figures