Table of Contents
Fetching ...

Spectral Higher-Order Neural Networks

Gianluca Peri, Timoteo Carletti, Duccio Fanelli, Diego Febbe

Abstract

Neural networks are fundamental tools of modern machine learning. The standard paradigm assumes binary interactions (across feedforward linear passes) between inter-tangled units, organized in sequential layers. Generalized architectures have been also designed that move beyond pairwise interactions, so as to account for higher-order couplings among computing neurons. Higher-order networks are however usually deployed as augmented graph neural networks (GNNs), and, as such, prove solely advantageous in contexts where the input exhibits an explicit hypergraph structure. Here, we present Spectral Higher-Order Neural Networks (SHONNs), a new algorithmic strategy to incorporate higher-order interactions in general-purpose, feedforward, network structures. SHONNs leverages a reformulation of the model in terms of spectral attributes. This allows to mitigate the common stability and parameter scaling problems that come along weighted, higher-order, forward propagations.

Spectral Higher-Order Neural Networks

Abstract

Neural networks are fundamental tools of modern machine learning. The standard paradigm assumes binary interactions (across feedforward linear passes) between inter-tangled units, organized in sequential layers. Generalized architectures have been also designed that move beyond pairwise interactions, so as to account for higher-order couplings among computing neurons. Higher-order networks are however usually deployed as augmented graph neural networks (GNNs), and, as such, prove solely advantageous in contexts where the input exhibits an explicit hypergraph structure. Here, we present Spectral Higher-Order Neural Networks (SHONNs), a new algorithmic strategy to incorporate higher-order interactions in general-purpose, feedforward, network structures. SHONNs leverages a reformulation of the model in terms of spectral attributes. This allows to mitigate the common stability and parameter scaling problems that come along weighted, higher-order, forward propagations.

Paper Structure

This paper contains 14 sections, 6 theorems, 39 equations, 9 figures.

Key Result

Theorem 4.1

Given $\bm x \in X \subset \mathbb{R}^n$, the space $\mathcal{H}$ of functions $h(1, \bm{x})$, defined by iteratively composing honn layers of the form of Eq. eq:honn with the parametrization defined by Eqs. eq:spectral_parametrization, eq:spectral_constraints, is dense in $C(X, \mathbb{R}^n)$, nam

Figures (9)

  • Figure 1: Cartoon representing different neural architectures. The standard neural networks (panel \ref{['fig:left']}) with input neurons, $x_i$, generating output signals, $y_k$, via weighted averages, i.e., linear combinations (colored arrows) followed by the application of a local nonlinearity (not shown). The standard triadic higher-order network (panel \ref{['fig:middle']}) is obtained by adding to the previous architecture, weighted sums of hyperlinks, mimicking the two body interaction $x_{i_1}x_{i_2}$ (symbolized by the curved arrows connecting a couple $(x_{i_1},x_{i_2})$ to an output $y_k$, each pair with its own specific color). The spectral higher-order networks (panel \ref{['fig:right']}), reduces the number of used parameters by exploiting the spectral decomposition. This yields an effective parameter sharing among hyperlinks (the curved arrows connecting a couple $(x_{i_1},x_{i_2})$ to several outputs $y_{k}$, share the same color).
  • Figure 2: Results of the perceptrons' training on mnist and fashion-mnist (via Adam optimizer). We performed learning rate warm-up, followed by a reduce-learning-rate-on-plateau protocol, to guard against a possible dependence of the results on an unlucky hyperparameter choice. The direct space triadic model is plagued by instability and confidence saturation problems, while the standard perceptron lacks in expressivity. At variance, the spectral triadic model (i) achieves the same performances of the standard triadic network, with a substantially more efficient parameter scaling and (ii) it is also way more stable.
  • Figure 3: Standard mlp vs. spectral triadic mlp on CIFAR-10. The models were trained with the Adam optimizer, following a halving-lr-on-plateau scheduler. The results show a clear advantage for the triadic architecture.
  • Figure 4: Standard mlp-mixer vs a spectral triadic version of it, on CIFAR-10. For this experiment the Adam optimizer was used with fixed learning rate. From the results, it seems that the spectral forward propagation not only shows a sufficient degree of expressivity, but also acts as an implicit regularizer, preventing overfitting.
  • Figure 5: Graphical sketch of Lemma \ref{['th:1D']}, illustrating the inter-layer connections $(1, x) \to \bm y$ described in Eq. \ref{['eq:associate_matrix_1d_x^2']}, under the triangular ansatz and in the simple setting where the only nonlinear interactions arise from the last column of matrix $W$. Subsequently, the elements of $\bm y$ can be linearly remapped to the canonical basis of $\mathcal{P}_2(x)$. The black dashed arrows represent the linear passage corresponding to the first sum term of Eq. \ref{['eq:honn']}, while the orange solid lines represent the higher-order interactions, namely second sum in Eq. \ref{['eq:honn']}, here just restricted to $x$ and itself.
  • ...and 4 more figures

Theorems & Definitions (11)

  • Theorem 4.1
  • Lemma 4.1
  • proof
  • Lemma 4.2
  • proof
  • Lemma 4.3
  • proof
  • Lemma A.1
  • proof
  • Theorem A.1
  • ...and 1 more