Table of Contents
Fetching ...

Looped ReLU MLPs May Be All You Need as Practical Programmable Computers

Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou

TL;DR

This work asks whether a fundamental neural module, a ReLU-MLP, can serve as a practical programmable computer. It constructs a looped 23-layer ReLU-MLP with width $n$ that emulates a SUBLEQ-based One Instruction Set Computer (OISC), establishing functional equivalence to a programmable computer. The authors show that a single forward pass costs $O(n \log n)$ time, which is asymptotically more efficient than looped Transformers at $O(n^2)$, and they achieve this with a low-rank decomposition reducing parameters to $O(n \log n)$. The results challenge the assumption that programmable computation requires complex architectures, suggesting that simple modules like ReLU-MLPs may unlock powerful capabilities with greater efficiency and sparing memory resources.

Abstract

Previous work has demonstrated that attention mechanisms are Turing complete. More recently, it has been shown that a looped 9-layer Transformer can function as a universal programmable computer. In contrast, the multi-layer perceptrons with $\mathsf{ReLU}$ activation ($\mathsf{ReLU}$-$\mathsf{MLP}$), one of the most fundamental components of neural networks, is known to be expressive; specifically, a two-layer neural network is a universal approximator given an exponentially large number of hidden neurons. However, it remains unclear whether a $\mathsf{ReLU}$-$\mathsf{MLP}$ can be made into a universal programmable computer using a practical number of weights. In this work, we provide an affirmative answer that a looped 23-layer $\mathsf{ReLU}$-$\mathsf{MLP}$ is capable of performing the basic necessary operations, more efficiently and effectively functioning as a programmable computer than a looped Transformer. This indicates simple modules have stronger expressive power than previously expected and have not been fully explored. Our work provides insights into the mechanisms of neural networks and demonstrates that complex tasks, such as functioning as a programmable computer, do not necessarily require advanced architectures like Transformers.

Looped ReLU MLPs May Be All You Need as Practical Programmable Computers

TL;DR

This work asks whether a fundamental neural module, a ReLU-MLP, can serve as a practical programmable computer. It constructs a looped 23-layer ReLU-MLP with width that emulates a SUBLEQ-based One Instruction Set Computer (OISC), establishing functional equivalence to a programmable computer. The authors show that a single forward pass costs time, which is asymptotically more efficient than looped Transformers at , and they achieve this with a low-rank decomposition reducing parameters to . The results challenge the assumption that programmable computation requires complex architectures, suggesting that simple modules like ReLU-MLPs may unlock powerful capabilities with greater efficiency and sparing memory resources.

Abstract

Previous work has demonstrated that attention mechanisms are Turing complete. More recently, it has been shown that a looped 9-layer Transformer can function as a universal programmable computer. In contrast, the multi-layer perceptrons with activation (-), one of the most fundamental components of neural networks, is known to be expressive; specifically, a two-layer neural network is a universal approximator given an exponentially large number of hidden neurons. However, it remains unclear whether a - can be made into a universal programmable computer using a practical number of weights. In this work, we provide an affirmative answer that a looped 23-layer - is capable of performing the basic necessary operations, more efficiently and effectively functioning as a programmable computer than a looped Transformer. This indicates simple modules have stronger expressive power than previously expected and have not been fully explored. Our work provides insights into the mechanisms of neural networks and demonstrates that complex tasks, such as functioning as a programmable computer, do not necessarily require advanced architectures like Transformers.

Paper Structure

This paper contains 31 sections, 20 theorems, 58 equations, 1 algorithm.

Key Result

Theorem 4.1

Let ReLU-MLP be defined as Definition def:relu_mlp. Let $n$ be the size of the state vector. Let $m$ be the number of instructions. Let $k$ be the number of one-bit data stored in the memory. For $i \in [k]$, each data is $v_i \in \{ \pm 1 \}$ and the memory size $k$ satisfies $k = n - 2 - 4 \log(n

Theorems & Definitions (40)

  • Definition 1.1: $\mathsf{ReLU}$
  • Definition 1.2: ReLU-MLP
  • Definition 3.1: One-Bit Data
  • Definition 3.2: $d$-Bits Data
  • Definition 3.3: $2$’s Complement
  • Definition 3.4: Address
  • Remark 3.5: Address Property
  • Definition 3.6: Instruction
  • Definition 3.7: One-Bit State
  • Theorem 4.1: Looped ReLU-MLP as Programmable Computer, Informal Version of Theorem \ref{['thm:looped_relu_mlp_as_programmable_computer']}
  • ...and 30 more