Trainable Highly-expressive Activation Functions
Irit Chelly, Shahaf E. Finder, Shira Ifergane, Oren Freifeld
TL;DR
Fixed activation functions constrain expressiveness and can bias learning; the authors propose DiTAC, a trainable activation built from CPAB highly expressive diffeomorphisms. DiTAC defines a GELU-like activation with $DiTAC(x)=\tilde{x}\cdot\Phi(x)$ and $\tilde{x}=T^{\theta}(x)$ on a user-defined interval $[a,b]$, extended by variants such as Leaky-DiTAC and GE-DiTAC, along with a regularization term on the CPA velocity fields. Computational cost is mitigated by a quantization/lookup-table approach and a Straight-Through Estimator for gradients, plus a regularization strategy $\mathcal{L}_{\mathrm{reg}}$ to stabilize training. Across toy tasks, real-world classification, semantic segmentation, and image generation, DiTAC consistently outperforms fixed AFs and existing trainable AFs with only a small parameter overhead, and code is publicly available for reproduction.
Abstract
Nonlinear activation functions are pivotal to the success of deep neural nets, and choosing the appropriate activation function can significantly affect their performance. Most networks use fixed activation functions (e.g., ReLU, GELU, etc.), and this choice might limit their expressiveness. Furthermore, different layers may benefit from diverse activation functions. Consequently, there has been a growing interest in trainable activation functions. In this paper, we introduce DiTAC, a trainable highly-expressive activation function based on an efficient diffeomorphic transformation (called CPAB). Despite introducing only a negligible number of trainable parameters, DiTAC enhances model expressiveness and performance, often yielding substantial improvements. It also outperforms existing activation functions (regardless whether the latter are fixed or trainable) in tasks such as semantic segmentation, image generation, regression problems, and image classification. Our code is available at https://github.com/BGU-CS-VIL/DiTAC.
