Generalized Activation via Multivariate Projection
Jiayun Li, Yuxiao Cheng, Yiwen Lu, Zhuofan Xia, Yilin Mo, Gao Huang
TL;DR
This work addresses the limitation of traditional SISO activations by introducing the Multivariate Projection Unit (MPU), which performs nonlinear projection onto convex cones to create a MIMO activation. Grounded in a PGD–FNN correspondence, the approach generalizes ReLU by replacing the simple nonnegative projection with cone projections such as the second-order cone, enabling greater expressive power and connections to SOCP/SDP. The authors prove that cone-based activations strictly enhance expressivity over ReLU and validate this both theoretically and empirically across multidimensional function fitting, CNNs on CIFAR10 and ImageNet-1k, and reinforcement learning in the Ant environment; they also connect activations to proximal operators and Moreau envelopes to motivate Leaky variants. The work suggests a broader research direction toward proximal-operator-based, multivariate nonlinearities and shows promising improvements with MPU in diverse tasks, while keeping computational costs comparable to standard activations. Overall, MPU offers a principled route to richer MIMO nonlinearities with potential impact on future neural network design and optimization.
Abstract
Activation functions are essential to introduce nonlinearity into neural networks, with the Rectified Linear Unit (ReLU) often favored for its simplicity and effectiveness. Motivated by the structural similarity between a shallow Feedforward Neural Network (FNN) and a single iteration of the Projected Gradient Descent (PGD) algorithm, a standard approach for solving constrained optimization problems, we consider ReLU as a projection from R onto the nonnegative half-line R+. Building on this interpretation, we extend ReLU by substituting it with a generalized projection operator onto a convex cone, such as the Second-Order Cone (SOC) projection, thereby naturally extending it to a Multivariate Projection Unit (MPU), an activation function with multiple inputs and multiple outputs. We further provide mathematical proof establishing that FNNs activated by SOC projections outperform those utilizing ReLU in terms of expressive power. Experimental evaluations on widely-adopted architectures further corroborate MPU's effectiveness against a broader range of existing activation functions.
