Table of Contents
Fetching ...

Multilinear Operator Networks

Yixin Cheng, Grigorios G. Chrysos, Markos Georgopoulos, Volkan Cevher

TL;DR

MONet introduces a fully multilinear neural network built from Mu-Layers that capture multiplicative, high-order interactions within input tokens, enabling activation-free polynomial expansions to approach modern architectures. By stacking Poly-Blocks and employing pyramid patch embedding, MONet achieves strong performance on ImageNet1K and other benchmarks while maintaining favorable compute costs compared to prior polynomial networks. The model also demonstrates interpretability through a Poly Neural ODE solver that recovers symbolic Lotka-Volterra dynamics, and exhibits robustness on ImageNet-C. Overall, MONet suggests a viable activation-free alternative with competitive accuracy and potential broad applicability beyond vision tasks, though a complete theoretical characterization remains an open area for future work.

Abstract

Despite the remarkable capabilities of deep neural networks in image recognition, the dependence on activation functions remains a largely unexplored area and has yet to be eliminated. On the other hand, Polynomial Networks is a class of models that does not require activation functions, but have yet to perform on par with modern architectures. In this work, we aim close this gap and propose MONet, which relies solely on multilinear operators. The core layer of MONet, called Mu-Layer, captures multiplicative interactions of the elements of the input token. MONet captures high-degree interactions of the input elements and we demonstrate the efficacy of our approach on a series of image recognition and scientific computing benchmarks. The proposed model outperforms prior polynomial networks and performs on par with modern architectures. We believe that MONet can inspire further research on models that use entirely multilinear operations.

Multilinear Operator Networks

TL;DR

MONet introduces a fully multilinear neural network built from Mu-Layers that capture multiplicative, high-order interactions within input tokens, enabling activation-free polynomial expansions to approach modern architectures. By stacking Poly-Blocks and employing pyramid patch embedding, MONet achieves strong performance on ImageNet1K and other benchmarks while maintaining favorable compute costs compared to prior polynomial networks. The model also demonstrates interpretability through a Poly Neural ODE solver that recovers symbolic Lotka-Volterra dynamics, and exhibits robustness on ImageNet-C. Overall, MONet suggests a viable activation-free alternative with competitive accuracy and potential broad applicability beyond vision tasks, though a complete theoretical characterization remains an open area for future work.

Abstract

Despite the remarkable capabilities of deep neural networks in image recognition, the dependence on activation functions remains a largely unexplored area and has yet to be eliminated. On the other hand, Polynomial Networks is a class of models that does not require activation functions, but have yet to perform on par with modern architectures. In this work, we aim close this gap and propose MONet, which relies solely on multilinear operators. The core layer of MONet, called Mu-Layer, captures multiplicative interactions of the elements of the input token. MONet captures high-degree interactions of the input elements and we demonstrate the efficacy of our approach on a series of image recognition and scientific computing benchmarks. The proposed model outperforms prior polynomial networks and performs on par with modern architectures. We believe that MONet can inspire further research on models that use entirely multilinear operations.
Paper Structure (30 sections, 2 theorems, 18 equations, 15 figures, 18 tables)

This paper contains 30 sections, 2 theorems, 18 equations, 15 figures, 18 tables.

Key Result

Proposition 1

The Mu-Layer captures multiplicative interactions between elements of each token.

Figures (15)

  • Figure 1: The architecture of the proposed Mu-Layer (on the left) and MONet (on the right). In the left figure, the grey box represents layer normalization. The color solid line boxes represent channel projection in different dimensions, all projection operations are linear. The $\ast$ box denotes an elementwise (Hadamard) product. The red dash box represents the spatial aggregation module.
  • Figure 2: The training loss change with epochs trained(Left). The ground truth and model predicted trajectory. (Right) Our model achieves low loss in 20 epochs and successfully predicts real trajectory.
  • Figure 3: The Schematic of (simple) MONet and Multi-stage MONet. PPE represents our pyramid patch embedding.
  • Figure 4: The Schematic of Mu-Layer. Blue boxes correspond to learnable parameters. Green and red boxes denote input and output, respectively. The $\ast$ denotes the Hadamard product, the $+$ denotes element-wise addition. The gray box denotes the spatial aggregation module, the dotted line represents it as an optional module. In our design the first Mu-Layer of each Poly-Block includes a spatial aggregation unit, while the second Mu-Layer does not.
  • Figure 5: Pyramid Patch Embedding
  • ...and 10 more figures

Theorems & Definitions (3)

  • Proposition 1
  • Proposition 2
  • proof