Table of Contents
Fetching ...

Analytical Solution of a Three-layer Network with a Matrix Exponential Activation Function

Kuo Gai, Shihua Zhang

TL;DR

The paper addresses the theoretical question of why deeper neural networks can outperform shallow ones by deriving an analytical solution for a three-layer network with a matrix-exponential activation $f(X)=W_3\exp(W_2\exp(W_1X))$. It provides a constructive existence result under invertibility conditions (e.g., $X_1,X_2,Y_1,Y_2$ invertible and $X_1-X_2$ invertible), showing how to choose $W_1,W_2,W_3$ (involving a scalar $\alpha>0$, a matrix logarithm, and a Lie-theoretic construction) so that $f(X_1)=Y_1$ and $f(X_2)=Y_2$ without gradient descent. The approach leverages the matrix exponential's properties, BCH-type relations, and a block-matrix trick to decouple the system, illustrating depth-induced solvability beyond a single layer. This work provides a foundation for future multi-layer analytical analyses and connects depth efficiency to Lie-theoretic structures in matrix activations.

Abstract

In practice, deeper networks tend to be more powerful than shallow ones, but this has not been understood theoretically. In this paper, we find the analytical solution of a three-layer network with a matrix exponential activation function, i.e., $$ f(X)=W_3\exp(W_2\exp(W_1X)), X\in \mathbb{C}^{d\times d} $$ have analytical solutions for the equations $$ Y_1=f(X_1),Y_2=f(X_2) $$ for $X_1,X_2,Y_1,Y_2$ with only invertible assumptions. Our proof shows the power of depth and the use of a non-linear activation function, since one layer network can only solve one equation,i.e.,$Y=WX$.

Analytical Solution of a Three-layer Network with a Matrix Exponential Activation Function

TL;DR

The paper addresses the theoretical question of why deeper neural networks can outperform shallow ones by deriving an analytical solution for a three-layer network with a matrix-exponential activation . It provides a constructive existence result under invertibility conditions (e.g., invertible and invertible), showing how to choose (involving a scalar , a matrix logarithm, and a Lie-theoretic construction) so that and without gradient descent. The approach leverages the matrix exponential's properties, BCH-type relations, and a block-matrix trick to decouple the system, illustrating depth-induced solvability beyond a single layer. This work provides a foundation for future multi-layer analytical analyses and connects depth efficiency to Lie-theoretic structures in matrix activations.

Abstract

In practice, deeper networks tend to be more powerful than shallow ones, but this has not been understood theoretically. In this paper, we find the analytical solution of a three-layer network with a matrix exponential activation function, i.e., have analytical solutions for the equations for with only invertible assumptions. Our proof shows the power of depth and the use of a non-linear activation function, since one layer network can only solve one equation,i.e.,.
Paper Structure (5 sections, 3 theorems, 37 equations, 1 figure)

This paper contains 5 sections, 3 theorems, 37 equations, 1 figure.

Key Result

Proposition 1

Let $\bm{X},\bm{Y}\in \mathbb{C}^{d\times d}$. If $\bm{X}\bm{Y}=\bm{Y}\bm{X}$, then $\operatorname{exp}(\bm{X})\operatorname{exp}(\bm{Y})=\operatorname{exp}(\bm{X}+\bm{Y})$

Figures (1)

  • Figure 1: The $s$ score of two-layer network with Sigmoid (left) and ReLU (right) activation function in the training process.

Theorems & Definitions (4)

  • Proposition 1
  • Proposition 2
  • Theorem 1
  • Proof 1