Table of Contents
Fetching ...

Sparse additive function decompositions facing basis transforms

Fatima Antarou Ba, Oleh Melnyk, Christian Wald, Gabriele Steidl

TL;DR

This work addresses the challenge of sparse additive decompositions in high-dimensional functions by learning an orthogonal basis that reveals sparsity in the ANOVA/anchored decompositions. It introduces a three-step transform strategy that couples vertex reduction via gradient SVD, edge reduction via joint block diagonalization of Hessians, and blockwise sparsification through a relaxed $0$-norm loss optimized over $SO(d)$ with gradient- or Landing-based methods. The approach is supported by theoretical links between derivatives, function graphs, and decompositions, together with convergence analysis for the manifold optimization and extensive numerical experiments demonstrating recovery of two-variable summands after transformation. The results have potential impact on efficient high-dimensional integration and sparse graphical modeling by enabling sparse representations where none were apparent in the original basis.

Abstract

High-dimensional real-world systems can often be well characterized by a small number of simultaneous low-complexity interactions. The analysis of variance (ANOVA) decomposition and the anchored decomposition are typical techniques to find sparse additive decompositions of functions. In this paper, we are interested in a setting, where these decompositions are not directly spare, but become so after an appropriate basis transform. Noting that the sparsity of those additive function decompositions is equivalent to the fact that most of its mixed partial derivatives vanish, we can exploit a connection to the underlying function graphs to determine an orthogonal transform that realizes the appropriate basis change. This is done in three steps: we apply singular value decomposition to minimize the number of vertices of the function graph, and joint block diagonalization techniques of families of matrices followed by sparse minimization based on relaxations of the zero ''norm'' for minimizing the number of edges. For the latter one, we propose and analyze minimization techniques over the manifold of special orthogonal matrices. Various numerical examples illustrate the reliability of our approach for functions having, after a basis transform, a sparse additive decomposition into summands with at most two variables.

Sparse additive function decompositions facing basis transforms

TL;DR

This work addresses the challenge of sparse additive decompositions in high-dimensional functions by learning an orthogonal basis that reveals sparsity in the ANOVA/anchored decompositions. It introduces a three-step transform strategy that couples vertex reduction via gradient SVD, edge reduction via joint block diagonalization of Hessians, and blockwise sparsification through a relaxed -norm loss optimized over with gradient- or Landing-based methods. The approach is supported by theoretical links between derivatives, function graphs, and decompositions, together with convergence analysis for the manifold optimization and extensive numerical experiments demonstrating recovery of two-variable summands after transformation. The results have potential impact on efficient high-dimensional integration and sparse graphical modeling by enabling sparse representations where none were apparent in the original basis.

Abstract

High-dimensional real-world systems can often be well characterized by a small number of simultaneous low-complexity interactions. The analysis of variance (ANOVA) decomposition and the anchored decomposition are typical techniques to find sparse additive decompositions of functions. In this paper, we are interested in a setting, where these decompositions are not directly spare, but become so after an appropriate basis transform. Noting that the sparsity of those additive function decompositions is equivalent to the fact that most of its mixed partial derivatives vanish, we can exploit a connection to the underlying function graphs to determine an orthogonal transform that realizes the appropriate basis change. This is done in three steps: we apply singular value decomposition to minimize the number of vertices of the function graph, and joint block diagonalization techniques of families of matrices followed by sparse minimization based on relaxations of the zero ''norm'' for minimizing the number of edges. For the latter one, we propose and analyze minimization techniques over the manifold of special orthogonal matrices. Various numerical examples illustrate the reliability of our approach for functions having, after a basis transform, a sparse additive decomposition into summands with at most two variables.
Paper Structure (20 sections, 25 theorems, 120 equations, 10 figures, 8 tables, 2 algorithms)

This paper contains 20 sections, 25 theorems, 120 equations, 10 figures, 8 tables, 2 algorithms.

Key Result

Theorem 2.1

Let $P_j$, $j \in [d]$ fulfill Assumption I. Then any $f \in \mathcal{F}$ admits a decomposition with This decomposition is minimal in the following sense: if for an arbitrary decomposition we have $\tilde{f}_\mathbf{v}=0$ for all $\mathbf{v} \supseteq \mathbf{u}$ and some $\mathbf{u} \in [d]$, then also $f_\mathbf{v} = 0$ in eq:general_decomposition.

Figures (10)

  • Figure 1: Left: $f(x) \coloneqq \sin(u_1^{\mathrm{T}}x\sqrt{2}/2)$ where $U=(u_1,u_2)$ is a $2$-dimensional rotation matrix of rotation angle $\pi/4.$ Right: Plot of $f_{U}.$ The left-hand diagram shows that $f$ depends on both variables $x_1$ and $x_2$ while $f_U$ is constant in $x_1.$
  • Figure 2: Left: $f(x)= \sin(5 u_1^{\mathrm{T}}x) + \sin(5 u_2^{\mathrm{T}}x)$ where $U=(u_1, u_2)$ is a $2$-dimensional rotation matrix of angle $\pi/4.$ Right: Image of the partial derivative $\partial_{x_1}f(x).$ The partial derivative $\partial_{x_1}f(x)$ depends on $x_1$ and $x_2$ since its values vary in both variables and thus $f$ cannot be decomposed as a sum of two univariate functions.
  • Figure 3: Left: $f_U$ where $f, U$ are as in Figure \ref{['fig:plt_f2']}. Right: Image of the partial derivative $\partial_{x_1}f_{U}$ which is constant with respect to $x_2$. Therefore $f_U$ can be decomposed as a sum of two univariate functions (see Theorem \ref{['decomposition:partial']}).
  • Figure 4: Mean optimality gap $\ell_{\mathcal{H}}\bigl(U^{(r)}\bigr) - \ell_{\mathcal{H}}\bigl(R^{\mathrm{T}}\bigr)$ over 100 experiments. Top: Noise-free matrices. Bottom: Noisy matrices. RI denotes random initialization and $h$ uses the grid search with the corresponding grid density value. Subscripts Rgd and La stand for Riemannian gradient descent and Landing algorithm, respectively.
  • Figure 5: Failure ratio $\mathcal{R}$\ref{['eq:ratio']} of suboptimal joint sparsity reconstructions for a given thresholding parameter $\eta$. First row: Clean data. Second row: Noisy data with additive random Gaussian noise $\mathcal{N}(0,\sigma), \sigma=10^{-3}$.
  • ...and 5 more figures

Theorems & Definitions (54)

  • Theorem 2.1
  • Example 2.2
  • Proposition 2.3
  • proof
  • Definition 2.4: Graph of a functions
  • Theorem 2.5
  • proof
  • Theorem 2.6
  • Example 3.1
  • Lemma 3.2
  • ...and 44 more