Table of Contents
Fetching ...

Deep Tree Tensor Networks for Image Recognition

Chang Nie, Junfang Chen, Yajie Chen

TL;DR

The paper tackles the challenge of applying tensor networks to large-scale image recognition by introducing Deep Tree Tensor Networks (DTTN), which capture $2^L$-order feature interactions through multilinear operations in a tree-like TN architecture. It introduces Antisymmetric Interaction Modules (AIMs) stacked to form DTTN, enabling parameter sharing, higher effective bond dimensions, and robust self-interaction without activations. The authors establish theoretical connections to polynomial and multilinear networks and demonstrate competitive performance on ImageNet-1k and other benchmarks, including successful segmentation and plug-in use in recommender systems. Overall, the work suggests that tensor-network-inspired architectures can achieve strong accuracy with interpretability and efficiency on large-scale vision tasks, potentially guiding future multilinear and TN-based models.

Abstract

Originating in quantum physics, tensor networks (TNs) have been widely adopted as exponential machines and parameter decomposers for recognition tasks. Typical TN models, such as Matrix Product States (MPS), have not yet achieved successful application in natural image processing. When employed, they primarily serve to compress parameters within off-the-shelf networks, thus losing their distinctive capability to enhance exponential-order feature interactions. This paper introduces a novel architecture named \textit{\textbf{D}eep \textbf{T}ree \textbf{T}ensor \textbf{N}etwork} (DTTN), which captures $2^L$-order multiplicative interactions across features through multilinear operations, while essentially unfolding into a \emph{tree}-like TN topology with the parameter-sharing property. DTTN is stacked with multiple antisymmetric interacting modules (AIMs), and this design facilitates efficient implementation. Moreover, we theoretically reveal the equivalency among quantum-inspired TN models and polynomial and multilinear networks under certain conditions, and we believe that DTTN can inspire more interpretable studies in this field. We evaluate the proposed model against a series of benchmarks and achieve excellent performance compared to its peers and cutting-edge architectures. Our code will soon be publicly available.

Deep Tree Tensor Networks for Image Recognition

TL;DR

The paper tackles the challenge of applying tensor networks to large-scale image recognition by introducing Deep Tree Tensor Networks (DTTN), which capture -order feature interactions through multilinear operations in a tree-like TN architecture. It introduces Antisymmetric Interaction Modules (AIMs) stacked to form DTTN, enabling parameter sharing, higher effective bond dimensions, and robust self-interaction without activations. The authors establish theoretical connections to polynomial and multilinear networks and demonstrate competitive performance on ImageNet-1k and other benchmarks, including successful segmentation and plug-in use in recommender systems. Overall, the work suggests that tensor-network-inspired architectures can achieve strong accuracy with interpretability and efficiency on large-scale vision tasks, potentially guiding future multilinear and TN-based models.

Abstract

Originating in quantum physics, tensor networks (TNs) have been widely adopted as exponential machines and parameter decomposers for recognition tasks. Typical TN models, such as Matrix Product States (MPS), have not yet achieved successful application in natural image processing. When employed, they primarily serve to compress parameters within off-the-shelf networks, thus losing their distinctive capability to enhance exponential-order feature interactions. This paper introduces a novel architecture named \textit{\textbf{D}eep \textbf{T}ree \textbf{T}ensor \textbf{N}etwork} (DTTN), which captures -order multiplicative interactions across features through multilinear operations, while essentially unfolding into a \emph{tree}-like TN topology with the parameter-sharing property. DTTN is stacked with multiple antisymmetric interacting modules (AIMs), and this design facilitates efficient implementation. Moreover, we theoretically reveal the equivalency among quantum-inspired TN models and polynomial and multilinear networks under certain conditions, and we believe that DTTN can inspire more interpretable studies in this field. We evaluate the proposed model against a series of benchmarks and achieve excellent performance compared to its peers and cutting-edge architectures. Our code will soon be publicly available.

Paper Structure

This paper contains 18 sections, 4 theorems, 16 equations, 4 figures, 10 tables, 1 algorithm.

Key Result

Proposition 1

The DTTN has the capability to capture $2^L$ multiplicative interactions among input elements, which can be represented in the format of Equation eq1 as $\Phi(\boldsymbol{x})=\otimes^{2^L}\phi(\boldsymbol{x},\boldsymbol{\Lambda}_\phi)$. Consequently, the elements of $f(\boldsymbol{x})$ are homogeneo

Figures (4)

  • Figure 1: Schematic diagram of the quantum-inspired MPS model and DTTN towards image recognition task. The former is applied for simple inputs and setting a small local mapping dimension $d=2$ and bond dimension $D\leq 64$ in generalran2023tensor. DTTN handles complex inputs while retaining spatial locality in linear projection, and its parameter-sharing nature allows for maintaining a high bond dimension.
  • Figure 2: (a-d) Illustration of Core Blocks for Different Architectures. The MLP-Mixer utilizes GELU activation and other networks via the Hadamard product '$*$' to enable the network to learn complex representations. We emphasize that instance batch normalization (IBN) and layer normalizations (LN) xu2019understanding preceded before Hadamard product operations disrupt the polynomial unfolding nature of $\mathcal{R}$-PolyNets chrysos2023regularization and MONet cheng2024multilinear. In contrast, the succinctly designed AIM circumvents this issue. The optional LN inside AIM not only enhances performance but also facilitates faster convergence. (e) Comparison of Different Networks on ImageNet-1k. When comparing various networks trained on ImageNet over different epochs, DTTN stands out by achieving state-of-the-art performance compared to other multilinear networks, significantly outperforming them.
  • Figure 3: Schematic diagram of the DTTN architecture.
  • Figure 4: Top-1 accuracy and loss visualization for different architectures trained from scratch on ImageNet-100. DTTN$^\dagger$-S shows better performance and convergence

Theorems & Definitions (6)

  • Proposition 1
  • Theorem 1
  • Proposition
  • proof
  • Theorem
  • proof