Generalized Activation via Multivariate Projection

Jiayun Li; Yuxiao Cheng; Yiwen Lu; Zhuofan Xia; Yilin Mo; Gao Huang

Generalized Activation via Multivariate Projection

Jiayun Li, Yuxiao Cheng, Yiwen Lu, Zhuofan Xia, Yilin Mo, Gao Huang

TL;DR

This work addresses the limitation of traditional SISO activations by introducing the Multivariate Projection Unit (MPU), which performs nonlinear projection onto convex cones to create a MIMO activation. Grounded in a PGD–FNN correspondence, the approach generalizes ReLU by replacing the simple nonnegative projection with cone projections such as the second-order cone, enabling greater expressive power and connections to SOCP/SDP. The authors prove that cone-based activations strictly enhance expressivity over ReLU and validate this both theoretically and empirically across multidimensional function fitting, CNNs on CIFAR10 and ImageNet-1k, and reinforcement learning in the Ant environment; they also connect activations to proximal operators and Moreau envelopes to motivate Leaky variants. The work suggests a broader research direction toward proximal-operator-based, multivariate nonlinearities and shows promising improvements with MPU in diverse tasks, while keeping computational costs comparable to standard activations. Overall, MPU offers a principled route to richer MIMO nonlinearities with potential impact on future neural network design and optimization.

Abstract

Activation functions are essential to introduce nonlinearity into neural networks, with the Rectified Linear Unit (ReLU) often favored for its simplicity and effectiveness. Motivated by the structural similarity between a shallow Feedforward Neural Network (FNN) and a single iteration of the Projected Gradient Descent (PGD) algorithm, a standard approach for solving constrained optimization problems, we consider ReLU as a projection from R onto the nonnegative half-line R+. Building on this interpretation, we extend ReLU by substituting it with a generalized projection operator onto a convex cone, such as the Second-Order Cone (SOC) projection, thereby naturally extending it to a Multivariate Projection Unit (MPU), an activation function with multiple inputs and multiple outputs. We further provide mathematical proof establishing that FNNs activated by SOC projections outperform those utilizing ReLU in terms of expressive power. Experimental evaluations on widely-adopted architectures further corroborate MPU's effectiveness against a broader range of existing activation functions.

Generalized Activation via Multivariate Projection

TL;DR

Abstract

Paper Structure (24 sections, 10 theorems, 37 equations, 6 figures, 9 tables)

This paper contains 24 sections, 10 theorems, 37 equations, 6 figures, 9 tables.

Introduction
Multivariate Activation Function
Motivation: Shallow FNN and Projected Gradient Descent
Method
Expressive Capability of FNN with Cone Activation
Extension: Design Activation Functions with Proximal Operators
Experiment
Multidimensional Function Fitting via FNN
Convolutional Neural Networks Experiments
Reinforcement Learning Experiments
Conclusion
Proof of Theorem \ref{['thm:fnn_relu_pgd']}
Projection to $n$-dimensional Cone
Proof of Theorem \ref{['thm:leaky_proximal']}
Proof of Theorem \ref{['thm:representation']}
...and 9 more sections

Key Result

Proposition 1

If a proper step size $\gamma>0$ is chosen, such that $\|I-\gamma P\|_2<1,$ then the problem in equation eq:projected_problem can be solved by repeating the following two steps until convergence: where $\Pi_{\mathbb{S}}$ is the projection operator from $\mathbb{R}^n$ to the set $\mathbb{S}$.

Figures (6)

Figure 1: The method proposed in this paper. (a) The structural similarity between a single iteration of the PGD algorithm and a single layer of the shallow FNN; (b) ReLU can be considered as the projection from $\mathbb{R}$ onto the nonnegative half line $\mathbb{R}_+$; (c) Visualization of the projection function from $\mathbb{R}^2$ onto the cone $C_\alpha^{(2)}$ in $\mathbb{R}^2$; (d) The architecture of the shallow FNN with the multivariate activation function; (e) The architecture of the shallow FNN with the ReLU activation function.
Figure 2: The cone $C_\alpha^{(3)}$ in $\mathbb{R}^3$ and some examples on the intersection between the cone $C_\alpha^{(3)}$ and some 2-dimensional planes.
Figure 3: The Mean Squared approximation Error (MSE) of the projection $\Pi_{C_{\pi/3}^{(2)}}(\boldsymbol{x})$ and the 2-dimensional Leaky ReLU function by the FNN activated by univariate functions and the MPU plotted in a log plot w.r.t. different hidden states. Left: The approximation error for the FNN with the projection $\Pi_{C_{\pi/3}^{(2)}}(\boldsymbol{x})$. Right: The approximation error for the FNN activated by the 2-dimensional Leaky ReLU function.
Figure 4: Reward curve of the Ant environment with PPO.
Figure :
...and 1 more figures

Theorems & Definitions (23)

Proposition 1: Projected Gradient Descent proximalbook
Theorem 1
Remark 1
Corollary 1
Example 1
Definition 1: MPU
Definition 2: $m$-dimensional second-order cone with half-apex angle $\alpha$
Theorem 2: Expressive capability for projection to cones and ReLU
Remark 2
Remark 3
...and 13 more

Generalized Activation via Multivariate Projection

TL;DR

Abstract

Generalized Activation via Multivariate Projection

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (23)