Analytical Solution of a Three-layer Network with a Matrix Exponential Activation Function

Kuo Gai; Shihua Zhang

Analytical Solution of a Three-layer Network with a Matrix Exponential Activation Function

Kuo Gai, Shihua Zhang

TL;DR

The paper addresses the theoretical question of why deeper neural networks can outperform shallow ones by deriving an analytical solution for a three-layer network with a matrix-exponential activation $f(X)=W_3\exp(W_2\exp(W_1X))$. It provides a constructive existence result under invertibility conditions (e.g., $X_1,X_2,Y_1,Y_2$ invertible and $X_1-X_2$ invertible), showing how to choose $W_1,W_2,W_3$ (involving a scalar $\alpha>0$, a matrix logarithm, and a Lie-theoretic construction) so that $f(X_1)=Y_1$ and $f(X_2)=Y_2$ without gradient descent. The approach leverages the matrix exponential's properties, BCH-type relations, and a block-matrix trick to decouple the system, illustrating depth-induced solvability beyond a single layer. This work provides a foundation for future multi-layer analytical analyses and connects depth efficiency to Lie-theoretic structures in matrix activations.

Abstract

In practice, deeper networks tend to be more powerful than shallow ones, but this has not been understood theoretically. In this paper, we find the analytical solution of a three-layer network with a matrix exponential activation function, i.e., $$ f(X)=W_3\exp(W_2\exp(W_1X)), X\in \mathbb{C}^{d\times d} $$ have analytical solutions for the equations $$ Y_1=f(X_1),Y_2=f(X_2) $$ for $X_1,X_2,Y_1,Y_2$ with only invertible assumptions. Our proof shows the power of depth and the use of a non-linear activation function, since one layer network can only solve one equation,i.e.,$Y=WX$.

Analytical Solution of a Three-layer Network with a Matrix Exponential Activation Function

TL;DR

. It provides a constructive existence result under invertibility conditions (e.g.,

invertible and

invertible), showing how to choose

(involving a scalar

, a matrix logarithm, and a Lie-theoretic construction) so that

and

without gradient descent. The approach leverages the matrix exponential's properties, BCH-type relations, and a block-matrix trick to decouple the system, illustrating depth-induced solvability beyond a single layer. This work provides a foundation for future multi-layer analytical analyses and connects depth efficiency to Lie-theoretic structures in matrix activations.

Abstract

have analytical solutions for the equations

for

with only invertible assumptions. Our proof shows the power of depth and the use of a non-linear activation function, since one layer network can only solve one equation,i.e.,

Paper Structure (5 sections, 3 theorems, 37 equations, 1 figure)

This paper contains 5 sections, 3 theorems, 37 equations, 1 figure.

Introduction
Preliminary
Main result
Experimental Results
Conclusion

Key Result

Proposition 1

Let $\bm{X},\bm{Y}\in \mathbb{C}^{d\times d}$. If $\bm{X}\bm{Y}=\bm{Y}\bm{X}$, then $\operatorname{exp}(\bm{X})\operatorname{exp}(\bm{Y})=\operatorname{exp}(\bm{X}+\bm{Y})$

Figures (1)

Figure 1: The $s$ score of two-layer network with Sigmoid (left) and ReLU (right) activation function in the training process.

Theorems & Definitions (4)

Proposition 1
Proposition 2
Theorem 1
Proof 1

Analytical Solution of a Three-layer Network with a Matrix Exponential Activation Function

TL;DR

Abstract

Analytical Solution of a Three-layer Network with a Matrix Exponential Activation Function

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (4)