Table of Contents
Fetching ...

CombU: A Combined Unit Activation for Fitting Mathematical Expressions with Neural Networks

Jiayu Li, Zilong Zhao, Kevin Yee, Uzair Javaid, Biplab Sikdar

TL;DR

This paper introduces the Combined Units activation (CombU), which employs different activation functions at various dimensions across different layers within a neural network, which can be theoretically proven to fit most mathematical expressions accurately.

Abstract

The activation functions are fundamental to neural networks as they introduce non-linearity into data relationships, thereby enabling deep networks to approximate complex data relations. Existing efforts to enhance neural network performance have predominantly focused on developing new mathematical functions. However, we find that a well-designed combination of existing activation functions within a neural network can also achieve this objective. In this paper, we introduce the Combined Units activation (CombU), which employs different activation functions at various dimensions across different layers. This approach can be theoretically proven to fit most mathematical expressions accurately. The experiments conducted on four mathematical expression datasets, compared against six State-Of-The-Art (SOTA) activation function algorithms, demonstrate that CombU outperforms all SOTA algorithms in 10 out of 16 metrics and ranks in the top three for the remaining six metrics.

CombU: A Combined Unit Activation for Fitting Mathematical Expressions with Neural Networks

TL;DR

This paper introduces the Combined Units activation (CombU), which employs different activation functions at various dimensions across different layers within a neural network, which can be theoretically proven to fit most mathematical expressions accurately.

Abstract

The activation functions are fundamental to neural networks as they introduce non-linearity into data relationships, thereby enabling deep networks to approximate complex data relations. Existing efforts to enhance neural network performance have predominantly focused on developing new mathematical functions. However, we find that a well-designed combination of existing activation functions within a neural network can also achieve this objective. In this paper, we introduce the Combined Units activation (CombU), which employs different activation functions at various dimensions across different layers. This approach can be theoretically proven to fit most mathematical expressions accurately. The experiments conducted on four mathematical expression datasets, compared against six State-Of-The-Art (SOTA) activation function algorithms, demonstrate that CombU outperforms all SOTA algorithms in 10 out of 16 metrics and ranks in the top three for the remaining six metrics.
Paper Structure (26 sections, 7 theorems, 17 equations, 6 figures, 13 tables)

This paper contains 26 sections, 7 theorems, 17 equations, 6 figures, 13 tables.

Key Result

Lemma 3.1

Capability for Exponential Expressions. Exponential function can be constructed by a network. Formally writing, given $x\in\mathcal{D}, \exp(x)$ can be constructed as an output of a network.

Figures (6)

  • Figure 1: Radar graph of performances of each different activation functions on different mathematical expressions. "clf" stands for the classification task, and "reg" stands for the regression task. The values are normalized based on the range of the metric on the dataset over different activation functions and then rounded to 0.1. Value 0 is not at the center, but at some distance out of the center, for better visibility. For each experiment, the average of the mean scores after normalization in the experiments is drawn as data points in the graph.
  • Figure 2: Radar graph of performances of each different activation functions on different datasets. The values are normalized based on the range of the metric on the dataset over different activation functions and then rounded to 0.1. Value 0 is not at the center, but at some distance out of the center, for better visibility.
  • Figure 3: Training loss trend of different activation functions on mathematical expression regression experiments. The training losses are MSE losses, and average over each 100 steps of all 5 runs are drawn. The first two epochs are skipped to make the difference of later trend easier to see.
  • Figure 4: Training absolute error trend of different activation functions on mathematical expression regression experiments. Average over each 100 steps of all 5 runs are drawn. The first two epochs are skipped to make the difference of later trend easier to see.
  • Figure 5: Training absolute error trend of different activation functions on mathematical expression classification experiments. The training losses are cross entropy, and average over each 100 steps of all 5 runs are drawn. The first two epochs are skipped to make the difference of later trend easier to see.
  • ...and 1 more figures

Theorems & Definitions (13)

  • Lemma 3.1
  • proof
  • Lemma 3.2
  • proof
  • Theorem 3.3
  • proof
  • Lemma A.1
  • Theorem A.2
  • proof
  • Lemma A.3
  • ...and 3 more