Table of Contents
Fetching ...

ReLUs Are Sufficient for Learning Implicit Neural Representations

Joseph Shenouda, Yamin Zhou, Robert D. Nowak

TL;DR

The paper shows that ReLU networks can learn implicit neural representations when augmented with a second-order B-spline wavelet activation (BW-ReLU), which mitigates spectral bias and ill-conditioning. It provides a rigorous link to the function-space perspective via the variation norm, deriving explicit norms for ReLU and BW-ReLU and elucidating how a scaling parameter c modulates regularity and generalization. The BW-ReLU approach yields well-conditioned training dynamics and competitive performance on CT reconstruction, signal representation, and super-resolution tasks, with a principled method for tuning INR regularity through the variation-norm framework. Overall, this work offers a principled, ReLU-only pathway for INR tasks and opens avenues for principled hyperparameter tuning and extensions to broader imaging and physics-informed problems.

Abstract

Motivated by the growing theoretical understanding of neural networks that employ the Rectified Linear Unit (ReLU) as their activation function, we revisit the use of ReLU activation functions for learning implicit neural representations (INRs). Inspired by second order B-spline wavelets, we incorporate a set of simple constraints to the ReLU neurons in each layer of a deep neural network (DNN) to remedy the spectral bias. This in turn enables its use for various INR tasks. Empirically, we demonstrate that, contrary to popular belief, one can learn state-of-the-art INRs based on a DNN composed of only ReLU neurons. Next, by leveraging recent theoretical works which characterize the kinds of functions ReLU neural networks learn, we provide a way to quantify the regularity of the learned function. This offers a principled approach to selecting the hyperparameters in INR architectures. We substantiate our claims through experiments in signal representation, super resolution, and computed tomography, demonstrating the versatility and effectiveness of our method. The code for all experiments can be found at https://github.com/joeshenouda/relu-inrs.

ReLUs Are Sufficient for Learning Implicit Neural Representations

TL;DR

The paper shows that ReLU networks can learn implicit neural representations when augmented with a second-order B-spline wavelet activation (BW-ReLU), which mitigates spectral bias and ill-conditioning. It provides a rigorous link to the function-space perspective via the variation norm, deriving explicit norms for ReLU and BW-ReLU and elucidating how a scaling parameter c modulates regularity and generalization. The BW-ReLU approach yields well-conditioned training dynamics and competitive performance on CT reconstruction, signal representation, and super-resolution tasks, with a principled method for tuning INR regularity through the variation-norm framework. Overall, this work offers a principled, ReLU-only pathway for INR tasks and opens avenues for principled hyperparameter tuning and extensions to broader imaging and physics-informed problems.

Abstract

Motivated by the growing theoretical understanding of neural networks that employ the Rectified Linear Unit (ReLU) as their activation function, we revisit the use of ReLU activation functions for learning implicit neural representations (INRs). Inspired by second order B-spline wavelets, we incorporate a set of simple constraints to the ReLU neurons in each layer of a deep neural network (DNN) to remedy the spectral bias. This in turn enables its use for various INR tasks. Empirically, we demonstrate that, contrary to popular belief, one can learn state-of-the-art INRs based on a DNN composed of only ReLU neurons. Next, by leveraging recent theoretical works which characterize the kinds of functions ReLU neural networks learn, we provide a way to quantify the regularity of the learned function. This offers a principled approach to selecting the hyperparameters in INR architectures. We substantiate our claims through experiments in signal representation, super resolution, and computed tomography, demonstrating the versatility and effectiveness of our method. The code for all experiments can be found at https://github.com/joeshenouda/relu-inrs.
Paper Structure (21 sections, 4 theorems, 55 equations, 9 figures)

This paper contains 21 sections, 4 theorems, 55 equations, 9 figures.

Key Result

Theorem 3.1

Suppose $\{b_j\}^{K}_{j=1}$ are quasi-evenly spaced on $[-1,1]$, $b_j = -1 +\frac{2(j-1)}{K} + o(\frac{1}{K})$. Let $\lambda_1 \geq \lambda_2 \geq \cdots \geq \lambda_K \geq 0$ be the eigenvalues of the Gram matrix $\mathbf{G}_{\sigma}$, then the condition number of $\mathbf{G}_{\sigma}$ satisfies

Figures (9)

  • Figure 1: (Left) A plot of the second order B-spline wavelet activation function. The red lines indicate the seven non-local ReLU functions that make up a single second order B-spline wavelet, shown in black. (Right) A BW-ReLU neural network with two neurons represented as a constrained ReLU network with 14 neurons. Within each group the orientation of each ReLU relative to the others is fixed. The input and output weights are learned and shared across groups of neurons. Shared input/output weights are denoted by the same color.
  • Figure 2: Condition number of feature embedding matrix generated by ReLU vs. BW-ReLU neural networks during training. We see that the ReLU produces a severely ill-conditioned feature matrix at initialization and throughout training. In contrast, the BW-ReLU neural network enjoys a very well conditioned feature matrix all throughout training. The rate of convergence is also correlated with how well conditioned the feature matrix in both cases.
  • Figure 3: The variation norm of BW-ReLU neural networks, $g_{\boldsymbol{\theta}}$ with various scales trained on univariate data. The red dots indicate our samples from the ground truth function $f^{*}$. We see that making $c$ too low leads to a poor fit to the data and a very high variation norm. On the other hand making $c$ to large results in a very oscillatory fit to the data. The interpolator which generalizes best corresponds to the one with the lowest variation norm.
  • Figure 4: Experiments on computed tomography reconstruction with various INR architectures. We report average PSNR and standard error across five random trials.
  • Figure 5: Experiments on signal representation for various INR architectures. We report average PSNR and standard error across five random trials.
  • ...and 4 more figures

Theorems & Definitions (6)

  • Theorem 3.1: zhang2023shallow
  • Proposition 3.2
  • Theorem 3.3
  • proof
  • Theorem B.1: Gershgorin circle theorem horn2012matrix
  • proof : Proof of \ref{['thm:bspline_w_cond']}