Neural Networks Trained by Weight Permutation are Universal Approximators

Yongqiang Cai; Gaohang Chen; Zhonghua Qiao

Neural Networks Trained by Weight Permutation are Universal Approximators

Yongqiang Cai, Gaohang Chen, Zhonghua Qiao

TL;DR

This work addresses whether neural networks trained by weight permutation can universally approximate continuous functions. It introduces a constructive proof for the universal approximation property of permutation-trained ReLU networks on one-dimensional targets, relying on a four-pair step-function approximator and a mechanism to annihilate unused parameters, with extensions to fixed linear output and to random initializations. The key contributions are (i) a rigorous UAP guarantee for both equidistant and random initializations, (ii) a detailed step/constant/linear approximator construction, (iii) analysis of approximation rates and the impact of initialization, and (iv) empirical demonstrations across 1D and modest higher-dimensional tasks that validate the theory and reveal permutation-driven learning patterns. The results have implications for hardware-friendly fixed-weight designs and offer new insights into network learning dynamics via permutation activity, suggesting practical and theoretical avenues for permutation-based training and analysis.

Abstract

The universal approximation property is fundamental to the success of neural networks, and has traditionally been achieved by training networks without any constraints on their parameters. However, recent experimental research proposed a novel permutation-based training method, which exhibited a desired classification performance without modifying the exact weight values. In this paper, we provide a theoretical guarantee of this permutation training method by proving its ability to guide a ReLU network to approximate one-dimensional continuous functions. Our numerical results further validate this method's efficiency in regression tasks with various initializations. The notable observations during weight permutation suggest that permutation training can provide an innovative tool for describing network learning behavior.

Neural Networks Trained by Weight Permutation are Universal Approximators

TL;DR

Abstract

Paper Structure (38 sections, 5 theorems, 52 equations, 9 figures, 1 table, 1 algorithm)

This paper contains 38 sections, 5 theorems, 52 equations, 9 figures, 1 table, 1 algorithm.

Introduction
Permutation training's advantages in hardware implementation
Related works
Outline
Notations and main results
Nerual networks architecture
Permutation and corresponding properties
Weight configuration and main theorems
Proof ideas
UAP of permutation-trained networks
The construction of step, constant, and linear function approximators
Step-matching construction of step function approximators $f_s^{\text{NN}}$
Constant-matching construction of constant function approximators $f_c^{\text{NN}}$
Linear reorganization of the linear function approximators $f_\ell^{\text{NN}}$
Annihilate the unused part of the network
...and 23 more sections

Key Result

Theorem 2.1

For any function $f^* \in C([0,1])$ and any small number $\varepsilon>0$, there exists a large integer $n \in \mathbb{Z}^+$, and $\alpha,\gamma \in \mathbb{R}$ for $f^{\text{NN}}$ in Eq. (NN) with equidistantly distributed $B^{(n)} = ( b_i )_{i = 1}^n := \left( 0, \tfrac{1}{n-1}, \cdots, 1 \right)$

Figures (9)

Figure 1: Main idea of the construction. (a) Approximate the continuous function $f^*$ by a piecewise constant function $g$ which is further approximated by permuted networks $f^{\text{NN}}$. (b) The step function approximator $f_s^{\text{NN}}$ constructed by step-matching. (c) Refine the basis functions $L$-times. (d) Stacking pseudo-copies to achieve the desired height.
Figure 2: Approximating one-dimensional continuous function (a): $y = -\sin(2\pi x)$ and (b): $y = \frac{1}{2} (5 x^3 - 3 x)$ with equidistantly, pairwise random, and randomly initialized network, where $x \in [-1, 1]$. The inset in each panel presents the target function as lines and an example of the approximation result as dots.
Figure 3: (a) Approximating two-dimensional continuous function $z = - \sin \pi xy$, where $x, y \in [-1, 1] \times [-1, 1]$. The inset panel presents the target function surface and an example of the approximation result as dots. (b) The two-dimensional basis function settings.
Figure 4: Approximating three-dimensional continuous function $f(x,y,z) = \sin 3x \cdot \cos y \cdot \sin 2z$, where $(x,y,z) \in [-1,1]^3$. (a) The convergence behavior under random seed 2022. (b) The three-dimensional illustration of the target function, where the function value $f(x,y,z)$ is plotted by the corresponding color in the color bar.
Figure 5: The performance of different initialization strategies in approximating $y = -\sin(2\pi x)$ in $[-1, 1]$. The pairwise initialization $W^{(2n)} = ( \pm p_i )_{i = 1}^n, \, p_i \sim \mathcal{U}[-1,1]$ is denoted as $W^{(2n)} \sim \mathcal{U}^{\pm}[0,1]^n$. The error bars are omitted for conciseness. The inset panel presents the target function as lines and an example of the approximation result as dots.
...and 4 more figures

Theorems & Definitions (13)

Definition 2.1
Theorem 2.1: UAP with a linear layer
Theorem 2.2: UAP without the linear layer
Theorem 2.3: UAP for randomly initialized parameters
Lemma 2.1
proof
Remark 1
Lemma 3.1
proof
Remark 2
...and 3 more

Neural Networks Trained by Weight Permutation are Universal Approximators

TL;DR

Abstract

Neural Networks Trained by Weight Permutation are Universal Approximators

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (13)