Table of Contents
Fetching ...

Learning Polynomial Activation Functions for Deep Neural Networks

Linghao Zhang, Jiawang Nie, Tingting Tang

TL;DR

This novel work frames the problem of training neural network with learnable polynomial activation functions as a polynomial optimization problem, which is solvable by the Moment-SOS hierarchy.

Abstract

Activation functions are crucial for deep neural networks. This novel work frames the problem of training neural network with learnable polynomial activation functions as a polynomial optimization problem, which is solvable by the Moment-SOS hierarchy. This work represents a fundamental departure from the conventional paradigm of training deep neural networks, which relies on local optimization methods like backpropagation and gradient descent. Numerical experiments are presented to demonstrate the accuracy and robustness of optimum parameter recovery in presence of noises.

Learning Polynomial Activation Functions for Deep Neural Networks

TL;DR

This novel work frames the problem of training neural network with learnable polynomial activation functions as a polynomial optimization problem, which is solvable by the Moment-SOS hierarchy.

Abstract

Activation functions are crucial for deep neural networks. This novel work frames the problem of training neural network with learnable polynomial activation functions as a polynomial optimization problem, which is solvable by the Moment-SOS hierarchy. This work represents a fundamental departure from the conventional paradigm of training deep neural networks, which relies on local optimization methods like backpropagation and gradient descent. Numerical experiments are presented to demonstrate the accuracy and robustness of optimum parameter recovery in presence of noises.

Paper Structure

This paper contains 5 sections, 2 theorems, 47 equations, 1 figure, 5 tables.

Key Result

Theorem 3.1

nie2023moment Suppose $w^*$ is a minimizer of (pop_mom). If the flat truncation condition (flat) holds, then the Moment-SOS relaxation is tight, i.e., $\vartheta_{\min} = \vartheta_{sos,k} = \vartheta_{mom,k}$.

Figures (1)

  • Figure 1: Plots for residual error $\varepsilon_i$ with $i=1 ,\ldots, 50$.

Theorems & Definitions (5)

  • Theorem 3.1
  • Theorem 3.2
  • proof
  • Remark 3.3
  • Example 3.4