Table of Contents
Fetching ...

Nonparametric Teaching of Implicit Neural Representations

Chen Zhang, Steven Tin Sui Luo, Jason Chun Lok Li, Yik-Chung Wu, Ngai Wong

TL;DR

This work reframes implicit neural representation learning as a nonparametric teaching problem, enabling efficient example selection for overparameterized MLPs. It shows that parameter-based gradient descent dynamics of the MLP are consistent with functional gradient descent in RKHS through a dynamic neural tangent kernel, allowing the introduction of Implicit Neural Teaching (INT). INT uses a greedy, discrepancy-based strategy to select signal fragments that maximize gradient steepness, accelerating INR training while preserving reconstruction quality. Empirical results across 1D, 2D, 3D, and audio modalities demonstrate 30%+ training time savings and robust performance, signaling substantial data-efficiency gains and bridging nonparametric teaching with deep learning.

Abstract

We investigate the learning of implicit neural representation (INR) using an overparameterized multilayer perceptron (MLP) via a novel nonparametric teaching perspective. The latter offers an efficient example selection framework for teaching nonparametrically defined (viz. non-closed-form) target functions, such as image functions defined by 2D grids of pixels. To address the costly training of INRs, we propose a paradigm called Implicit Neural Teaching (INT) that treats INR learning as a nonparametric teaching problem, where the given signal being fitted serves as the target function. The teacher then selects signal fragments for iterative training of the MLP to achieve fast convergence. By establishing a connection between MLP evolution through parameter-based gradient descent and that of function evolution through functional gradient descent in nonparametric teaching, we show for the first time that teaching an overparameterized MLP is consistent with teaching a nonparametric learner. This new discovery readily permits a convenient drop-in of nonparametric teaching algorithms to broadly enhance INR training efficiency, demonstrating 30%+ training time savings across various input modalities.

Nonparametric Teaching of Implicit Neural Representations

TL;DR

This work reframes implicit neural representation learning as a nonparametric teaching problem, enabling efficient example selection for overparameterized MLPs. It shows that parameter-based gradient descent dynamics of the MLP are consistent with functional gradient descent in RKHS through a dynamic neural tangent kernel, allowing the introduction of Implicit Neural Teaching (INT). INT uses a greedy, discrepancy-based strategy to select signal fragments that maximize gradient steepness, accelerating INR training while preserving reconstruction quality. Empirical results across 1D, 2D, 3D, and audio modalities demonstrate 30%+ training time savings and robust performance, signaling substantial data-efficiency gains and bridging nonparametric teaching with deep learning.

Abstract

We investigate the learning of implicit neural representation (INR) using an overparameterized multilayer perceptron (MLP) via a novel nonparametric teaching perspective. The latter offers an efficient example selection framework for teaching nonparametrically defined (viz. non-closed-form) target functions, such as image functions defined by 2D grids of pixels. To address the costly training of INRs, we propose a paradigm called Implicit Neural Teaching (INT) that treats INR learning as a nonparametric teaching problem, where the given signal being fitted serves as the target function. The teacher then selects signal fragments for iterative training of the MLP to achieve fast convergence. By establishing a connection between MLP evolution through parameter-based gradient descent and that of function evolution through functional gradient descent in nonparametric teaching, we show for the first time that teaching an overparameterized MLP is consistent with teaching a nonparametric learner. This new discovery readily permits a convenient drop-in of nonparametric teaching algorithms to broadly enhance INR training efficiency, demonstrating 30%+ training time savings across various input modalities.
Paper Structure (20 sections, 5 theorems, 65 equations, 13 figures, 3 tables, 1 algorithm)

This paper contains 20 sections, 5 theorems, 65 equations, 13 figures, 3 tables, 1 algorithm.

Key Result

Lemma 3

(Chain rule for functional gradients) For differentiable functions $G(F): \mathbb{R}\mapsto\mathbb{R}$ that depends on functionals $F(f):\mathcal{H}\mapsto\mathbb{R}$, the formula commonly refers to the chain rule.

Figures (13)

  • Figure 1: Fitting a 2D grayscale image signal with Implicit Neural Teaching (INT): By comparing the disparity between the given signal and the current MLP output (a), the nonparametric teacher (b) selectively chooses examples (pixels) of the greatest disparity (red boxes), instead of a raster scan, to feed to the MLP learner (c) who undergoes learning (d) and outputs the final (e).
  • Figure 2: An illustration of the spectral understanding in a 2D function coordinate system (i.e., RKHS) with the $\{K(\bm{x}_i,\cdot)\}_2$ basis. The basis can be non-orthogonal if $K(\bm{x}_i,\bm{x}_j)\neq0$ for $i\neq j$. The coordinate of $f_{\theta^t}-f^*$ represents its projection on each axis, which is given by $\langle\left(f_{\theta^t}-f^*\right),\left[K(\bm{x}_i,\cdot)\right]^T_2\rangle_{\mathcal{H}}=\left[f_{\theta^t}(\bm{x}_i)-f^*(\bm{x}_i)\right]^T_2$, and that of $K(\bm{x}_\dagger,\cdot)$ is $\langle K(\bm{x}_\dagger,\cdot),\left[K(\bm{x}_i,\cdot)\right]^T_2\rangle_{\mathcal{H}}=\left[K(\bm{x}_\dagger,\bm{x}_i)\right]^T_2$, which is stored in the $\dagger$-th row of $\bm{K}$. Assuming $\bar{\bm{K}}=\left[0.50.250.250.5\right]$, the eigenvalues and the respective eigenvectors can be computed as $\lambda_1=0.75,\lambda_2=0.25$ and $\bm{v}_1=(\frac{\sqrt{2}}{2},\frac{\sqrt{2}}{2})^T,\bm{v}_2=(-\frac{\sqrt{2}}{2},\frac{\sqrt{2}}{2})^T$, respectively. Assuming $[f_{\theta^t}(\bm{x}_i)-f^*(\bm{x}_i)]_2$ equals $(1,0.5)$, its first and second principal component projections are $\frac{3\sqrt{2}}{4}$ and $-\frac{\sqrt{2}}{4}$, respectively. Moreover, the discrepancy between $f_{\theta^t}$ and $f^*$ diminishes at a rate of $e^{-\frac{3\eta t}{4}}$ and $e^{-\frac{\eta t}{4}}$ for the first and second principal components, respectively.
  • Figure 3: Training dynamics of $f$ using PGD and FGD. Apparently, $f_{\text{PGD}}$ closely follows $f_{\text{FGD}}$, empirically showing the evolution consistency between PGD training and FGD training.
  • Figure 4: Reconstruction quality of SIREN. (b) trains SIREN without (w/o) INT using all pixels. (c) trains it w/o INT using 20% randomly selected pixels. (d) trains it using INT of 20% selection rate. (e) trains it using progressive INT (i.e., increasing selection rate progressively from 20% to 100%).
  • Figure 5: Progression of INT selected pixels (marked as black) at corresponding iterations when training with INT 20% (top) and 40% (bottom).
  • ...and 8 more figures

Theorems & Definitions (7)

  • Definition 1
  • Definition 2
  • Lemma 3
  • Lemma 4
  • Theorem 5
  • Proposition 6
  • Lemma 7