Table of Contents
Fetching ...

Your Network May Need to Be Rewritten: Network Adversarial Based on High-Dimensional Function Graph Decomposition

Xiaoyan Su, Yinghao Zhu, Run Li

TL;DR

The paper tackles ICS and gradient deviation by introducing a network adversarial framework that alternates activation functions across layers, coupled with high-dimensional function graph decomposition (HD-FGD) to handle complex activations. It defines global adversarial (GA) for simple activations and split adversarial (SA) via HD-FGD for complex ones, forming gradient-adversarial dynamics during backpropagation. Empirical results on CIFAR-10/100 with ViT, SwT, and ResNet show that GA, HD-FGD, and SA improve accuracy, stabilize losses, and accelerate training, enabling activation-function rewriting with substantial performance gains. The approach offers a practical pathway to retrofit existing models with improved nonlinear expressivity and training efficiency without architectural overhauls.

Abstract

In the past, research on a single low dimensional activation function in networks has led to internal covariate shift and gradient deviation problems. A relatively small research area is how to use function combinations to provide property completion for a single activation function application. We propose a network adversarial method to address the aforementioned challenges. This is the first method to use different activation functions in a network. Based on the existing activation functions in the current network, an adversarial function with opposite derivative image properties is constructed, and the two are alternately used as activation functions for different network layers. For complex situations, we propose a method of high-dimensional function graph decomposition(HD-FGD), which divides it into different parts and then passes through a linear layer. After integrating the inverse of the partial derivatives of each decomposed term, we obtain its adversarial function by referring to the computational rules of the decomposition process. The use of network adversarial methods or the use of HD-FGD alone can effectively replace the traditional MLP+activation function mode. Through the above methods, we have achieved a substantial improvement over standard activation functions regarding both training efficiency and predictive accuracy. The article addresses the adversarial issues associated with several prevalent activation functions, presenting alternatives that can be seamlessly integrated into existing models without any adverse effects. We will release the code as open source after the conference review process is completed.

Your Network May Need to Be Rewritten: Network Adversarial Based on High-Dimensional Function Graph Decomposition

TL;DR

The paper tackles ICS and gradient deviation by introducing a network adversarial framework that alternates activation functions across layers, coupled with high-dimensional function graph decomposition (HD-FGD) to handle complex activations. It defines global adversarial (GA) for simple activations and split adversarial (SA) via HD-FGD for complex ones, forming gradient-adversarial dynamics during backpropagation. Empirical results on CIFAR-10/100 with ViT, SwT, and ResNet show that GA, HD-FGD, and SA improve accuracy, stabilize losses, and accelerate training, enabling activation-function rewriting with substantial performance gains. The approach offers a practical pathway to retrofit existing models with improved nonlinear expressivity and training efficiency without architectural overhauls.

Abstract

In the past, research on a single low dimensional activation function in networks has led to internal covariate shift and gradient deviation problems. A relatively small research area is how to use function combinations to provide property completion for a single activation function application. We propose a network adversarial method to address the aforementioned challenges. This is the first method to use different activation functions in a network. Based on the existing activation functions in the current network, an adversarial function with opposite derivative image properties is constructed, and the two are alternately used as activation functions for different network layers. For complex situations, we propose a method of high-dimensional function graph decomposition(HD-FGD), which divides it into different parts and then passes through a linear layer. After integrating the inverse of the partial derivatives of each decomposed term, we obtain its adversarial function by referring to the computational rules of the decomposition process. The use of network adversarial methods or the use of HD-FGD alone can effectively replace the traditional MLP+activation function mode. Through the above methods, we have achieved a substantial improvement over standard activation functions regarding both training efficiency and predictive accuracy. The article addresses the adversarial issues associated with several prevalent activation functions, presenting alternatives that can be seamlessly integrated into existing models without any adverse effects. We will release the code as open source after the conference review process is completed.
Paper Structure (28 sections, 33 equations, 18 figures, 4 tables)

This paper contains 28 sections, 33 equations, 18 figures, 4 tables.

Figures (18)

  • Figure 1: (a) Gradient distribution
  • Figure 2: (b) Data distribution
  • Figure 3: (c) Predictive performance
  • Figure 5: Schematic diagram of network adversarial method.
  • Figure 6: HD-FGD splits Tanh into four terms.
  • ...and 13 more figures