CombU: A Combined Unit Activation for Fitting Mathematical Expressions with Neural Networks

Jiayu Li; Zilong Zhao; Kevin Yee; Uzair Javaid; Biplab Sikdar

CombU: A Combined Unit Activation for Fitting Mathematical Expressions with Neural Networks

Jiayu Li, Zilong Zhao, Kevin Yee, Uzair Javaid, Biplab Sikdar

TL;DR

This paper introduces the Combined Units activation (CombU), which employs different activation functions at various dimensions across different layers within a neural network, which can be theoretically proven to fit most mathematical expressions accurately.

Abstract

The activation functions are fundamental to neural networks as they introduce non-linearity into data relationships, thereby enabling deep networks to approximate complex data relations. Existing efforts to enhance neural network performance have predominantly focused on developing new mathematical functions. However, we find that a well-designed combination of existing activation functions within a neural network can also achieve this objective. In this paper, we introduce the Combined Units activation (CombU), which employs different activation functions at various dimensions across different layers. This approach can be theoretically proven to fit most mathematical expressions accurately. The experiments conducted on four mathematical expression datasets, compared against six State-Of-The-Art (SOTA) activation function algorithms, demonstrate that CombU outperforms all SOTA algorithms in 10 out of 16 metrics and ranks in the top three for the remaining six metrics.

CombU: A Combined Unit Activation for Fitting Mathematical Expressions with Neural Networks

TL;DR

Abstract

Paper Structure (26 sections, 7 theorems, 17 equations, 6 figures, 13 tables)

This paper contains 26 sections, 7 theorems, 17 equations, 6 figures, 13 tables.

Introduction
Related Work
CombU Explained
CombU Formalization
CombU Motivation
Combination Choice and Theoretical Analysis
Experimental Results and Discussion
Fitting Mathematical Formulae
Fitting Real-world Datasets
Generative Tasks on Tabular Data
Discussion
Conclusion & Future Works
Conclusion
Future Work
Appendix A: Proof of Basic Rules
...and 11 more sections

Key Result

Lemma 3.1

Capability for Exponential Expressions. Exponential function can be constructed by a network. Formally writing, given $x\in\mathcal{D}, \exp(x)$ can be constructed as an output of a network.

Figures (6)

Figure 1: Radar graph of performances of each different activation functions on different mathematical expressions. "clf" stands for the classification task, and "reg" stands for the regression task. The values are normalized based on the range of the metric on the dataset over different activation functions and then rounded to 0.1. Value 0 is not at the center, but at some distance out of the center, for better visibility. For each experiment, the average of the mean scores after normalization in the experiments is drawn as data points in the graph.
Figure 2: Radar graph of performances of each different activation functions on different datasets. The values are normalized based on the range of the metric on the dataset over different activation functions and then rounded to 0.1. Value 0 is not at the center, but at some distance out of the center, for better visibility.
Figure 3: Training loss trend of different activation functions on mathematical expression regression experiments. The training losses are MSE losses, and average over each 100 steps of all 5 runs are drawn. The first two epochs are skipped to make the difference of later trend easier to see.
Figure 4: Training absolute error trend of different activation functions on mathematical expression regression experiments. Average over each 100 steps of all 5 runs are drawn. The first two epochs are skipped to make the difference of later trend easier to see.
Figure 5: Training absolute error trend of different activation functions on mathematical expression classification experiments. The training losses are cross entropy, and average over each 100 steps of all 5 runs are drawn. The first two epochs are skipped to make the difference of later trend easier to see.
...and 1 more figures

Theorems & Definitions (13)

Lemma 3.1
proof
Lemma 3.2
proof
Theorem 3.3
proof
Lemma A.1
Theorem A.2
proof
Lemma A.3
...and 3 more

CombU: A Combined Unit Activation for Fitting Mathematical Expressions with Neural Networks

TL;DR

Abstract

CombU: A Combined Unit Activation for Fitting Mathematical Expressions with Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (13)