Analyzing the Neural Tangent Kernel of Periodically Activated Coordinate Networks

Hemanth Saratchandran; Shin-Fang Chng; Simon Lucey

Analyzing the Neural Tangent Kernel of Periodically Activated Coordinate Networks

Hemanth Saratchandran, Shin-Fang Chng, Simon Lucey

TL;DR

This paper addresses why periodically activated networks, particularly cosine-activated coordinate networks, can outperform ReLU nets by analyzing their Neural Tangent Kernel (NTK). It derives two-sided bounds on the minimum eigenvalue of the empirical NTK $\lambda_{\min}(K_L)$ in a finite-width regime with a single wide hidden layer, showing a $\Theta(n_k^{3/2})$ scaling that yields a larger spectral gap than ReLU activations. The authors also prove a memorization capacity theorem under similar width growth conditions and provide empirical evidence that supports the theory, including comparisons with ReLU and measurements of the empirical Lipschitz constant. Collectively, the work advances understanding of how periodic activations influence training dynamics and memorization in coordinate networks and highlights potential practical benefits for implicit neural representations.

Abstract

Recently, neural networks utilizing periodic activation functions have been proven to demonstrate superior performance in vision tasks compared to traditional ReLU-activated networks. However, there is still a limited understanding of the underlying reasons for this improved performance. In this paper, we aim to address this gap by providing a theoretical understanding of periodically activated networks through an analysis of their Neural Tangent Kernel (NTK). We derive bounds on the minimum eigenvalue of their NTK in the finite width setting, using a fairly general network architecture which requires only one wide layer that grows at least linearly with the number of data samples. Our findings indicate that periodically activated networks are \textit{notably more well-behaved}, from the NTK perspective, than ReLU activated networks. Additionally, we give an application to the memorization capacity of such networks and verify our theoretical predictions empirically. Our study offers a deeper understanding of the properties of periodically activated neural networks and their potential in the field of deep learning.

Analyzing the Neural Tangent Kernel of Periodically Activated Coordinate Networks

TL;DR

in a finite-width regime with a single wide hidden layer, showing a

scaling that yields a larger spectral gap than ReLU activations. The authors also prove a memorization capacity theorem under similar width growth conditions and provide empirical evidence that supports the theory, including comparisons with ReLU and measurements of the empirical Lipschitz constant. Collectively, the work advances understanding of how periodic activations influence training dynamics and memorization in coordinate networks and highlights potential practical benefits for implicit neural representations.

Abstract

Paper Structure (18 sections, 13 theorems, 117 equations, 5 figures)

This paper contains 18 sections, 13 theorems, 117 equations, 5 figures.

Introduction
Notation and Assumptions
Main Result
Proof of Theorem \ref{['main_result_ntk']}
Implication for the Memorization Capability of Network
Minimum Singular Value of the Feature Matrix
Experiments
NTK experiments
NTK analysis where $n_1 = 8N$.
NTK analysis where $n_1=15N$.
Empirical Lipschitz constant
Empirical Lipschitz constant of a cosine activated network.
Comparison of empirical Lipschitz constant of a cosine and a ReLU-activated network.
Related Work.
Discussion and Conclusion.
...and 3 more sections

Key Result

Theorem 3.1

Let $f_L$ denote a depth $L$ neural network with $\phi(x) = cos(sx)$ as the activation, where $s > 0$ is a fixed frequency parameter, satisfying the network assumptions in Section notations. Let $\{x_i\}_{i=1}^N$ denote a set of i.i.d training data points sampled from the distribution $\mathcal{P}$, and $0$ otherwise. Then w.p. at least over $(W_l)_{l=1}^L$ and the data. Furthermore, we have w.

Figures (5)

Figure 1: The minimum eigenvalue of empirical NTK $\lambda_{min}(K_3)$ where $n_0 = 400$, $n_{1} = 8N$, and $n_2=400$. As predicted by Theorem \ref{['main_result_ntk']}, $\lambda_{min}(K_3)$ for a cosine activated network grows much faster than a ReLU-activated network.
Figure 2: The minimum eigenvalue of empirical NTK $\lambda_{min}(K_3)$ where $n_0=400$, $n_{1} = 15N$, and $n_2 = 400$. As predicted by Theorem \ref{['main_result_ntk']}, $\lambda_{min}(K_L)$ for a cosine activated network grows much faster than a ReLU-activated network.
Figure 3: The empirical Lipschitz constant of a cosine activated network over $1000$ data points, where $n_1 = n_2 = 64$, $n_4 = 1$ and $n_3$ varying from 64 to 2048, when $n_0 = 200$ and $n_0 =400$. This plot empirically confirms the assumption A4.
Figure 4: The empirical Lipschitz constant of cosine and ReLU-activated networks over $1000$ data points, where $n_1 = n_2 = 64$, $n_4 = 1$ and $n_3$ varying from 64 to 2048, when $n_0 = 200$ and $n_0 =400$. Zoom inset: The empirical Lipschitz constant of ReLU-activated network.
Figure 5: An example of the function $\chi(x)$ in red used in the proof of lemma \ref{['c6']}. The blue curve represents $sin(x)^2$

Theorems & Definitions (25)

Theorem 3.1
Lemma 4.1
proof : Proof of Theorem \ref{['main_result_ntk']}
Theorem 5.1
proof
Theorem 6.1
Lemma 1.1
proof
Lemma 1.2
proof
...and 15 more

Analyzing the Neural Tangent Kernel of Periodically Activated Coordinate Networks

TL;DR

Abstract

Analyzing the Neural Tangent Kernel of Periodically Activated Coordinate Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (25)