Table of Contents
Fetching ...

Conv-INR: Convolutional Implicit Neural Representation for Multimodal Visual Signals

Zhicheng Cai

TL;DR

This work introduces Conv-INR, the first implicit neural representation built entirely on convolution to represent visual signals. By operating on local patches, Conv-INR leverages the locality prior and naturally represents both low- and high-frequency components without primary function expansion, addressing spectral bias observed in MLP-based INRs. The authors present three reparameterization techniques—Structural, Static Weight, and Dynamic Weight Reparameterization—that boost expressive capacity without increasing inference cost. Across image fitting, CT/MRI reconstruction, and novel view synthesis, Conv-INR consistently outperforms competing MLP INRs, achieving significant PSNR gains (e.g., up to about $5.9$ dB on Kodak) and improved spectral representation. The approach offers practical benefits for signal representation and reconstruction, with reparameterizations providing additional performance gains while keeping inference efficiency unchanged.

Abstract

Implicit neural representation (INR) has recently emerged as a promising paradigm for signal representations. Typically, INR is parameterized by a multiplayer perceptron (MLP) which takes the coordinates as the inputs and generates corresponding attributes of a signal. However, MLP-based INRs face two critical issues: i) individually considering each coordinate while ignoring the connections; ii) suffering from the spectral bias thus failing to learn high-frequency components. While target visual signals usually exhibit strong local structures and neighborhood dependencies, and high-frequency components are significant in these signals, the issues harm the representational capacity of INRs. This paper proposes Conv-INR, the first INR model fully based on convolution. Due to the inherent attributes of convolution, Conv-INR can simultaneously consider adjacent coordinates and learn high-frequency components effectively. Compared to existing MLP-based INRs, Conv-INR has better representational capacity and trainability without requiring primary function expansion. We conduct extensive experiments on four tasks, including image fitting, CT/MRI reconstruction, and novel view synthesis, Conv-INR all significantly surpasses existing MLP-based INRs, validating the effectiveness. Finally, we raise three reparameterization methods that can further enhance the performance of the vanilla Conv-INR without introducing any extra inference cost.

Conv-INR: Convolutional Implicit Neural Representation for Multimodal Visual Signals

TL;DR

This work introduces Conv-INR, the first implicit neural representation built entirely on convolution to represent visual signals. By operating on local patches, Conv-INR leverages the locality prior and naturally represents both low- and high-frequency components without primary function expansion, addressing spectral bias observed in MLP-based INRs. The authors present three reparameterization techniques—Structural, Static Weight, and Dynamic Weight Reparameterization—that boost expressive capacity without increasing inference cost. Across image fitting, CT/MRI reconstruction, and novel view synthesis, Conv-INR consistently outperforms competing MLP INRs, achieving significant PSNR gains (e.g., up to about dB on Kodak) and improved spectral representation. The approach offers practical benefits for signal representation and reconstruction, with reparameterizations providing additional performance gains while keeping inference efficiency unchanged.

Abstract

Implicit neural representation (INR) has recently emerged as a promising paradigm for signal representations. Typically, INR is parameterized by a multiplayer perceptron (MLP) which takes the coordinates as the inputs and generates corresponding attributes of a signal. However, MLP-based INRs face two critical issues: i) individually considering each coordinate while ignoring the connections; ii) suffering from the spectral bias thus failing to learn high-frequency components. While target visual signals usually exhibit strong local structures and neighborhood dependencies, and high-frequency components are significant in these signals, the issues harm the representational capacity of INRs. This paper proposes Conv-INR, the first INR model fully based on convolution. Due to the inherent attributes of convolution, Conv-INR can simultaneously consider adjacent coordinates and learn high-frequency components effectively. Compared to existing MLP-based INRs, Conv-INR has better representational capacity and trainability without requiring primary function expansion. We conduct extensive experiments on four tasks, including image fitting, CT/MRI reconstruction, and novel view synthesis, Conv-INR all significantly surpasses existing MLP-based INRs, validating the effectiveness. Finally, we raise three reparameterization methods that can further enhance the performance of the vanilla Conv-INR without introducing any extra inference cost.
Paper Structure (12 sections, 5 equations, 4 figures, 1 table)

This paper contains 12 sections, 5 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Pipelines of MLP-based INR and Conv-INR. Take the image fitting task as the example.
  • Figure 2: Comparisons of the Conv-INR and various MLP-based INRs for representing the 2D image Lena. The corresponding Fourier spectra are also visualized.
  • Figure 3: Reparameterization methods specifically tailored for Conv-INR.
  • Figure 4: 3D intensity of the feature maps obtained by Conv-INR, SIREN and PE-MLP.