Table of Contents
Fetching ...

Neural Networks Trained by Weight Permutation are Universal Approximators

Yongqiang Cai, Gaohang Chen, Zhonghua Qiao

TL;DR

This work addresses whether neural networks trained by weight permutation can universally approximate continuous functions. It introduces a constructive proof for the universal approximation property of permutation-trained ReLU networks on one-dimensional targets, relying on a four-pair step-function approximator and a mechanism to annihilate unused parameters, with extensions to fixed linear output and to random initializations. The key contributions are (i) a rigorous UAP guarantee for both equidistant and random initializations, (ii) a detailed step/constant/linear approximator construction, (iii) analysis of approximation rates and the impact of initialization, and (iv) empirical demonstrations across 1D and modest higher-dimensional tasks that validate the theory and reveal permutation-driven learning patterns. The results have implications for hardware-friendly fixed-weight designs and offer new insights into network learning dynamics via permutation activity, suggesting practical and theoretical avenues for permutation-based training and analysis.

Abstract

The universal approximation property is fundamental to the success of neural networks, and has traditionally been achieved by training networks without any constraints on their parameters. However, recent experimental research proposed a novel permutation-based training method, which exhibited a desired classification performance without modifying the exact weight values. In this paper, we provide a theoretical guarantee of this permutation training method by proving its ability to guide a ReLU network to approximate one-dimensional continuous functions. Our numerical results further validate this method's efficiency in regression tasks with various initializations. The notable observations during weight permutation suggest that permutation training can provide an innovative tool for describing network learning behavior.

Neural Networks Trained by Weight Permutation are Universal Approximators

TL;DR

This work addresses whether neural networks trained by weight permutation can universally approximate continuous functions. It introduces a constructive proof for the universal approximation property of permutation-trained ReLU networks on one-dimensional targets, relying on a four-pair step-function approximator and a mechanism to annihilate unused parameters, with extensions to fixed linear output and to random initializations. The key contributions are (i) a rigorous UAP guarantee for both equidistant and random initializations, (ii) a detailed step/constant/linear approximator construction, (iii) analysis of approximation rates and the impact of initialization, and (iv) empirical demonstrations across 1D and modest higher-dimensional tasks that validate the theory and reveal permutation-driven learning patterns. The results have implications for hardware-friendly fixed-weight designs and offer new insights into network learning dynamics via permutation activity, suggesting practical and theoretical avenues for permutation-based training and analysis.

Abstract

The universal approximation property is fundamental to the success of neural networks, and has traditionally been achieved by training networks without any constraints on their parameters. However, recent experimental research proposed a novel permutation-based training method, which exhibited a desired classification performance without modifying the exact weight values. In this paper, we provide a theoretical guarantee of this permutation training method by proving its ability to guide a ReLU network to approximate one-dimensional continuous functions. Our numerical results further validate this method's efficiency in regression tasks with various initializations. The notable observations during weight permutation suggest that permutation training can provide an innovative tool for describing network learning behavior.
Paper Structure (38 sections, 5 theorems, 52 equations, 9 figures, 1 table, 1 algorithm)

This paper contains 38 sections, 5 theorems, 52 equations, 9 figures, 1 table, 1 algorithm.

Key Result

Theorem 2.1

For any function $f^* \in C([0,1])$ and any small number $\varepsilon>0$, there exists a large integer $n \in \mathbb{Z}^+$, and $\alpha,\gamma \in \mathbb{R}$ for $f^{\text{NN}}$ in Eq. (NN) with equidistantly distributed $B^{(n)} = ( b_i )_{i = 1}^n := \left( 0, \tfrac{1}{n-1}, \cdots, 1 \right)$

Figures (9)

  • Figure 1: Main idea of the construction. (a) Approximate the continuous function $f^*$ by a piecewise constant function $g$ which is further approximated by permuted networks $f^{\text{NN}}$. (b) The step function approximator $f_s^{\text{NN}}$ constructed by step-matching. (c) Refine the basis functions $L$-times. (d) Stacking pseudo-copies to achieve the desired height.
  • Figure 2: Approximating one-dimensional continuous function (a): $y = -\sin(2\pi x)$ and (b): $y = \frac{1}{2} (5 x^3 - 3 x)$ with equidistantly, pairwise random, and randomly initialized network, where $x \in [-1, 1]$. The inset in each panel presents the target function as lines and an example of the approximation result as dots.
  • Figure 3: (a) Approximating two-dimensional continuous function $z = - \sin \pi xy$, where $x, y \in [-1, 1] \times [-1, 1]$. The inset panel presents the target function surface and an example of the approximation result as dots. (b) The two-dimensional basis function settings.
  • Figure 4: Approximating three-dimensional continuous function $f(x,y,z) = \sin 3x \cdot \cos y \cdot \sin 2z$, where $(x,y,z) \in [-1,1]^3$. (a) The convergence behavior under random seed 2022. (b) The three-dimensional illustration of the target function, where the function value $f(x,y,z)$ is plotted by the corresponding color in the color bar.
  • Figure 5: The performance of different initialization strategies in approximating $y = -\sin(2\pi x)$ in $[-1, 1]$. The pairwise initialization $W^{(2n)} = ( \pm p_i )_{i = 1}^n, \, p_i \sim \mathcal{U}[-1,1]$ is denoted as $W^{(2n)} \sim \mathcal{U}^{\pm}[0,1]^n$. The error bars are omitted for conciseness. The inset panel presents the target function as lines and an example of the approximation result as dots.
  • ...and 4 more figures

Theorems & Definitions (13)

  • Definition 2.1
  • Theorem 2.1: UAP with a linear layer
  • Theorem 2.2: UAP without the linear layer
  • Theorem 2.3: UAP for randomly initialized parameters
  • Lemma 2.1
  • proof
  • Remark 1
  • Lemma 3.1
  • proof
  • Remark 2
  • ...and 3 more