Analytical Solution of a Three-layer Network with a Matrix Exponential Activation Function
Kuo Gai, Shihua Zhang
TL;DR
The paper addresses the theoretical question of why deeper neural networks can outperform shallow ones by deriving an analytical solution for a three-layer network with a matrix-exponential activation $f(X)=W_3\exp(W_2\exp(W_1X))$. It provides a constructive existence result under invertibility conditions (e.g., $X_1,X_2,Y_1,Y_2$ invertible and $X_1-X_2$ invertible), showing how to choose $W_1,W_2,W_3$ (involving a scalar $\alpha>0$, a matrix logarithm, and a Lie-theoretic construction) so that $f(X_1)=Y_1$ and $f(X_2)=Y_2$ without gradient descent. The approach leverages the matrix exponential's properties, BCH-type relations, and a block-matrix trick to decouple the system, illustrating depth-induced solvability beyond a single layer. This work provides a foundation for future multi-layer analytical analyses and connects depth efficiency to Lie-theoretic structures in matrix activations.
Abstract
In practice, deeper networks tend to be more powerful than shallow ones, but this has not been understood theoretically. In this paper, we find the analytical solution of a three-layer network with a matrix exponential activation function, i.e., $$ f(X)=W_3\exp(W_2\exp(W_1X)), X\in \mathbb{C}^{d\times d} $$ have analytical solutions for the equations $$ Y_1=f(X_1),Y_2=f(X_2) $$ for $X_1,X_2,Y_1,Y_2$ with only invertible assumptions. Our proof shows the power of depth and the use of a non-linear activation function, since one layer network can only solve one equation,i.e.,$Y=WX$.
