VeLU: Variance-enhanced Learning Unit for Deep Neural Networks
Ashkan Shakarami, Yousef Yeganeh, Azade Farshad, Lorenzo Nicolè, Stefano Ghidoni, Nassir Navab
TL;DR
VeLU introduces a variance-aware activation by coupling ArcTan-ArcSin nonlinearities with a variance-adaptive scaling and Wasserstein-2 regularization to modulate activations based on local statistics without adding trainable layers. This design directly mitigates internal covariate shift at the activation level, improving gradient flow, training stability, and generalization across CNNs and ViTs. Extensive experiments across six architectures and 12 vision benchmarks show consistent improvements over ReLU, Swish, and GELU, with robust performance across resolutions and optimizers and only a single learnable parameter. The work provides practical guidelines for parameter ranges and outlines limitations and future directions, with public implementation available on GitHub, highlighting VeLU as a lightweight, architecture-agnostic activation alternative.
Abstract
Activation functions play a critical role in deep neural networks by shaping gradient flow, optimization stability, and generalization. While ReLU remains widely used due to its simplicity, it suffers from gradient sparsity and dead-neuron issues and offers no adaptivity to input statistics. Smooth alternatives such as Swish and GELU improve gradient propagation but still apply a fixed transformation regardless of the activation distribution. In this paper, we propose VeLU, a Variance-enhanced Learning Unit that introduces variance-aware and distributionally aligned nonlinearity through a principled combination of ArcTan-ArcSin transformations, adaptive scaling, and Wasserstein-2 regularization (Optimal Transport). This design enables VeLU to modulate its response based on local activation variance, mitigate internal covariate shift at the activation level, and improve training stability without adding learnable parameters or architectural overhead. Extensive experiments across six deep neural networks show that VeLU outperforms ReLU, ReLU6, Swish, and GELU on 12 vision benchmarks. The implementation of VeLU is publicly available in GitHub.
