1-Lipschitz Neural Networks are more expressive with N-Activations
Bernd Prach, Christoph H. Lampert
TL;DR
The paper addresses robustness in deep networks by focusing on $1$-Lipschitz constraints and arguges that activation functions shape expressiveness. It establishes that common activations are not universal in 1D and introduces the $\mathcal{N}$-activation, proven universal for $1$-CPWL functions in 1D. The authors provide theoretical results showing the insufficiency of 2-piece activations and demonstrate, through toy and standard benchmarks, that $\mathcal{N}$-activation achieves universal representation and competitive certified robustness compared to MaxMin. Empirically, $\mathcal{N}$-activation networks fit target functions better and maintain comparable robust accuracy across CIFAR-10/100 and Tiny ImageNet, with careful initialization and training strategies. A public code release accompanies the work to enable adoption of the proposed activation in practical robust learning tasks.
Abstract
A crucial property for achieving secure, trustworthy and interpretable deep learning systems is their robustness: small changes to a system's inputs should not result in large changes to its outputs. Mathematically, this means one strives for networks with a small Lipschitz constant. Several recent works have focused on how to construct such Lipschitz networks, typically by imposing constraints on the weight matrices. In this work, we study an orthogonal aspect, namely the role of the activation function. We show that commonly used activation functions, such as MaxMin, as well as all piece-wise linear ones with two segments unnecessarily restrict the class of representable functions, even in the simplest one-dimensional setting. We furthermore introduce the new N-activation function that is provably more expressive than currently popular activation functions. We provide code at https://github.com/berndprach/NActivation.
