1-Lipschitz Neural Networks are more expressive with N-Activations

Bernd Prach; Christoph H. Lampert

1-Lipschitz Neural Networks are more expressive with N-Activations

Bernd Prach, Christoph H. Lampert

TL;DR

The paper addresses robustness in deep networks by focusing on $1$-Lipschitz constraints and arguges that activation functions shape expressiveness. It establishes that common activations are not universal in 1D and introduces the $\mathcal{N}$-activation, proven universal for $1$-CPWL functions in 1D. The authors provide theoretical results showing the insufficiency of 2-piece activations and demonstrate, through toy and standard benchmarks, that $\mathcal{N}$-activation achieves universal representation and competitive certified robustness compared to MaxMin. Empirically, $\mathcal{N}$-activation networks fit target functions better and maintain comparable robust accuracy across CIFAR-10/100 and Tiny ImageNet, with careful initialization and training strategies. A public code release accompanies the work to enable adoption of the proposed activation in practical robust learning tasks.

Abstract

A crucial property for achieving secure, trustworthy and interpretable deep learning systems is their robustness: small changes to a system's inputs should not result in large changes to its outputs. Mathematically, this means one strives for networks with a small Lipschitz constant. Several recent works have focused on how to construct such Lipschitz networks, typically by imposing constraints on the weight matrices. In this work, we study an orthogonal aspect, namely the role of the activation function. We show that commonly used activation functions, such as MaxMin, as well as all piece-wise linear ones with two segments unnecessarily restrict the class of representable functions, even in the simplest one-dimensional setting. We furthermore introduce the new N-activation function that is provably more expressive than currently popular activation functions. We provide code at https://github.com/berndprach/NActivation.

1-Lipschitz Neural Networks are more expressive with N-Activations

TL;DR

The paper addresses robustness in deep networks by focusing on

-Lipschitz constraints and arguges that activation functions shape expressiveness. It establishes that common activations are not universal in 1D and introduces the

-activation, proven universal for

-CPWL functions in 1D. The authors provide theoretical results showing the insufficiency of 2-piece activations and demonstrate, through toy and standard benchmarks, that

-activation achieves universal representation and competitive certified robustness compared to MaxMin. Empirically,

-activation networks fit target functions better and maintain comparable robust accuracy across CIFAR-10/100 and Tiny ImageNet, with careful initialization and training strategies. A public code release accompanies the work to enable adoption of the proposed activation in practical robust learning tasks.

Abstract

Paper Structure (25 sections, 11 theorems, 27 equations, 6 figures, 5 tables)

This paper contains 25 sections, 11 theorems, 27 equations, 6 figures, 5 tables.

Introduction
Background and Definitions
Related work
$1$-Lipschitz linear layers
Shortcomings of the $1$-Lipschitz setup
Theoretical Results
Limitations of the existing activation functions
The N-Activation is universal in 1D.
Experimental setup
Fitting the $\mathcal{N}$-function
Certified Robust Classification
Architecture
Loss function
Optimization
Hyperparameter search
...and 10 more sections

Key Result

Theorem 1

No 2-piece 1-CPWL activation is universal.

Figures (6)

Figure 1: A plot of the $\mathcal{N}$-function.
Figure 2: A plot of the $\mathcal{N}$-activation with parameters $\theta_1$ and $\theta_2$.
Figure 3: Mean squared error on the training set reported for $1$-Lipschitz AOL networks with different activation functions for fitting the $\mathcal{N}$-function.
Figure 4: ReLU networks, MaxMin networks and absolute value networks can not fit the $\mathcal{N}$-function, whereas $\mathcal{N}$-activation networks can!
Figure 5: Certified robust accuracy on different datasets, for different $1$-Lipschitz layers. MaxMin and $\mathcal{N}$-activation compared.
...and 1 more figures

Theorems & Definitions (18)

Theorem 1
Corollary 1
Theorem 2
Lemma 1
proof
Theorem 2
proof
Corollary 1
proof
Theorem 2
...and 8 more

1-Lipschitz Neural Networks are more expressive with N-Activations

TL;DR

Abstract

1-Lipschitz Neural Networks are more expressive with N-Activations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (18)