On Dimension-Free Transformer: An Application of STP to AI
Daizhan Cheng
TL;DR
This paper tackles the challenge of dimension mismatch in transformer architectures by introducing a Dimension-Free Transformer (DFT) built on Semi-Tensor Product (STP) and Semi-Tensor Addition (STA). It replaces conventional dimension-matching operations with projection-based padding and dimension-free operators, enabling inputs and outputs of arbitrary dimensions via concepts such as Cross-Dimensional Projection and Hypervectors/Hypermatrices. Key contributions include generalized hypervectors, dimension-varying attention and multi-head attention, dimension-free add & norm, and dimension-free feed-forward networks, all unified under a projection-padding framework. The approach promises more efficient handling of signals with varying dimensions and provides a rigorous mathematical foundation for dimension-free AI systems with potential practical impact in flexible, scalable transformer implementations.
Abstract
The matrix expressions for every parts of a transformer are firstly described. Based on semi-tensor product (STP) of matrices the hypervectors are reconsidered and the linear transformation over hypervectors is constructed by using projection. Its properties and calculating formulas are obtained. Using projection-based transformation of hypervector (PBTH), the framework of dimension-free transformer (DFT) is proposed by verifying each linear transformation in a transformer and replacing it by a proper PBTH, which allows the inputs and outputs being of arbitrary dimensions. Using balanced information about all entries, DFT must be more efficient in dealing with signals.
