Table of Contents
Fetching ...

On Dimension-Free Transformer: An Application of STP to AI

Daizhan Cheng

TL;DR

This paper tackles the challenge of dimension mismatch in transformer architectures by introducing a Dimension-Free Transformer (DFT) built on Semi-Tensor Product (STP) and Semi-Tensor Addition (STA). It replaces conventional dimension-matching operations with projection-based padding and dimension-free operators, enabling inputs and outputs of arbitrary dimensions via concepts such as Cross-Dimensional Projection and Hypervectors/Hypermatrices. Key contributions include generalized hypervectors, dimension-varying attention and multi-head attention, dimension-free add & norm, and dimension-free feed-forward networks, all unified under a projection-padding framework. The approach promises more efficient handling of signals with varying dimensions and provides a rigorous mathematical foundation for dimension-free AI systems with potential practical impact in flexible, scalable transformer implementations.

Abstract

The matrix expressions for every parts of a transformer are firstly described. Based on semi-tensor product (STP) of matrices the hypervectors are reconsidered and the linear transformation over hypervectors is constructed by using projection. Its properties and calculating formulas are obtained. Using projection-based transformation of hypervector (PBTH), the framework of dimension-free transformer (DFT) is proposed by verifying each linear transformation in a transformer and replacing it by a proper PBTH, which allows the inputs and outputs being of arbitrary dimensions. Using balanced information about all entries, DFT must be more efficient in dealing with signals.

On Dimension-Free Transformer: An Application of STP to AI

TL;DR

This paper tackles the challenge of dimension mismatch in transformer architectures by introducing a Dimension-Free Transformer (DFT) built on Semi-Tensor Product (STP) and Semi-Tensor Addition (STA). It replaces conventional dimension-matching operations with projection-based padding and dimension-free operators, enabling inputs and outputs of arbitrary dimensions via concepts such as Cross-Dimensional Projection and Hypervectors/Hypermatrices. Key contributions include generalized hypervectors, dimension-varying attention and multi-head attention, dimension-free add & norm, and dimension-free feed-forward networks, all unified under a projection-padding framework. The approach promises more efficient handling of signals with varying dimensions and provides a rigorous mathematical foundation for dimension-free AI systems with potential practical impact in flexible, scalable transformer implementations.

Abstract

The matrix expressions for every parts of a transformer are firstly described. Based on semi-tensor product (STP) of matrices the hypervectors are reconsidered and the linear transformation over hypervectors is constructed by using projection. Its properties and calculating formulas are obtained. Using projection-based transformation of hypervector (PBTH), the framework of dimension-free transformer (DFT) is proposed by verifying each linear transformation in a transformer and replacing it by a proper PBTH, which allows the inputs and outputs being of arbitrary dimensions. Using balanced information about all entries, DFT must be more efficient in dealing with signals.

Paper Structure

This paper contains 22 sections, 9 theorems, 82 equations, 8 figures.

Key Result

Proposition 2.7

Let $A\in {\mathcal{M}}_{m\times n}$, $B\in {\mathcal{M}}_{p\times q}$, and $t=\mathop{\mathrm{lcm}}\nolimits(n,p)$. Consider the DK-STP defined by (2.1.11). There exists a matrix called the bridge matrix, such that

Figures (8)

  • Figure 1: Attenders in a Transformer
  • Figure 2: A Transformer
  • Figure 3: Single Layer Neural Network
  • Figure 4: Input
  • Figure 5: Scaled Dot-Product Attention
  • ...and 3 more figures

Theorems & Definitions (40)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Definition 2.4
  • Definition 2.5
  • Definition 2.6
  • Proposition 2.7
  • Definition 2.8
  • Definition 2.9
  • Proposition 2.10
  • ...and 30 more