Table of Contents
Fetching ...

Neural networks with trainable matrix activation functions

Zhengqi Liu, Shuhao Cao, Yuwen Li, Ludmil Zikatanov

TL;DR

This work develops a systematic approach to constructing matrix-valued activation functions whose entries are generalized from rectified linear unit (ReLU) whose activation is based on matrix-vector multiplications using only scalar multiplications and comparisons.

Abstract

The training process of neural networks usually optimize weights and bias parameters of linear transformations, while nonlinear activation functions are pre-specified and fixed. This work develops a systematic approach to constructing matrix-valued activation functions whose entries are generalized from ReLU. The activation is based on matrix-vector multiplications using only scalar multiplications and comparisons. The proposed activation functions depend on parameters that are trained along with the weights and bias vectors. Neural networks based on this approach are simple and efficient and are shown to be robust in numerical experiments.

Neural networks with trainable matrix activation functions

TL;DR

This work develops a systematic approach to constructing matrix-valued activation functions whose entries are generalized from rectified linear unit (ReLU) whose activation is based on matrix-vector multiplications using only scalar multiplications and comparisons.

Abstract

The training process of neural networks usually optimize weights and bias parameters of linear transformations, while nonlinear activation functions are pre-specified and fixed. This work develops a systematic approach to constructing matrix-valued activation functions whose entries are generalized from ReLU. The activation is based on matrix-vector multiplications using only scalar multiplications and comparisons. The proposed activation functions depend on parameters that are trained along with the weights and bias vectors. Neural networks based on this approach are simple and efficient and are shown to be robust in numerical experiments.

Paper Structure

This paper contains 7 sections, 14 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Graphs of $\sigma_\ell$.
  • Figure 2: Training errors for $\sin(\pi x_{1}+ \cdots+\pi x_n)$, single hidden layer
  • Figure 3: Training errors for $\sin(\pi x_{1}+ \cdots+\pi x_n)$, two hidden layers.
  • Figure 4: Plot of $\sin(100 \pi x) + \cos(50 \pi x) + \sin(\pi x)$ and training loss comparison
  • Figure 5: Approximations to $\sin(100 \pi x) + \cos (50 \pi x) + \sin(\pi x)$, TMAF-type
  • ...and 4 more figures