Table of Contents
Fetching ...

Explaining the Implicit Neural Canvas: Connecting Pixels to Neurons by Tracing their Contributions

Namitha Padmanabhan, Matthew Gwilliam, Pulkit Kumar, Shishira R Maiya, Max Ehrlich, Abhinav Shrivastava

TL;DR

INRs offer compact continuous representations of signals but their internal mechanisms are poorly understood. XINC introduces the implicit neural canvas, a neuron-to-pixel contribution framework, and demonstrates it on FFN and NeRV INRs to reveal how pixels are composed from neuron activities. Key findings include distributed, color- and edge-biased representations, object- and motion-related dynamics, and the ability to cluster neurons by contribution profiles, with implications for INR explainability and compression. The approach provides a practical tool for diagnosing INR behavior in image and video tasks and can guide future improvements in INR-based workflows.

Abstract

The many variations of Implicit Neural Representations (INRs), where a neural network is trained as a continuous representation of a signal, have tremendous practical utility for downstream tasks including novel view synthesis, video compression, and image super-resolution. Unfortunately, the inner workings of these networks are seriously under-studied. Our work, eXplaining the Implicit Neural Canvas (XINC), is a unified framework for explaining properties of INRs by examining the strength of each neuron's contribution to each output pixel. We call the aggregate of these contribution maps the Implicit Neural Canvas and we use this concept to demonstrate that the INRs we study learn to "see" the frames they represent in surprising ways. For example, INRs tend to have highly distributed representations. While lacking high-level object semantics, they have a significant bias for color and edges, and are almost entirely space-agnostic. We arrive at our conclusions by examining how objects are represented across time in video INRs, using clustering to visualize similar neurons across layers and architectures, and show that this is dominated by motion. These insights demonstrate the general usefulness of our analysis framework. Our project page is available at https://namithap10.github.io/xinc.

Explaining the Implicit Neural Canvas: Connecting Pixels to Neurons by Tracing their Contributions

TL;DR

INRs offer compact continuous representations of signals but their internal mechanisms are poorly understood. XINC introduces the implicit neural canvas, a neuron-to-pixel contribution framework, and demonstrates it on FFN and NeRV INRs to reveal how pixels are composed from neuron activities. Key findings include distributed, color- and edge-biased representations, object- and motion-related dynamics, and the ability to cluster neurons by contribution profiles, with implications for INR explainability and compression. The approach provides a practical tool for diagnosing INR behavior in image and video tasks and can guide future improvements in INR-based workflows.

Abstract

The many variations of Implicit Neural Representations (INRs), where a neural network is trained as a continuous representation of a signal, have tremendous practical utility for downstream tasks including novel view synthesis, video compression, and image super-resolution. Unfortunately, the inner workings of these networks are seriously under-studied. Our work, eXplaining the Implicit Neural Canvas (XINC), is a unified framework for explaining properties of INRs by examining the strength of each neuron's contribution to each output pixel. We call the aggregate of these contribution maps the Implicit Neural Canvas and we use this concept to demonstrate that the INRs we study learn to "see" the frames they represent in surprising ways. For example, INRs tend to have highly distributed representations. While lacking high-level object semantics, they have a significant bias for color and edges, and are almost entirely space-agnostic. We arrive at our conclusions by examining how objects are represented across time in video INRs, using clustering to visualize similar neurons across layers and architectures, and show that this is dominated by motion. These insights demonstrate the general usefulness of our analysis framework. Our project page is available at https://namithap10.github.io/xinc.
Paper Structure (24 sections, 2 equations, 19 figures, 1 table)

This paper contains 24 sections, 2 equations, 19 figures, 1 table.

Figures (19)

  • Figure 1: How do INRs "see" the images they represent? We propose XINC, which we use to show what parts of a learned visual signal are important to each neuron of an INR. Here, we take a sample from the neural canvas, with contribution maps for 5 neurons sampled from the last layer of a NeRV trained on a Cityscapes cordts2016cityscapes video. Some neurons attend to colors and textures, while others focus on low-level features like edges.
  • Figure 2: (left) We dissect MLP-based INRs by aggregating their activations (weights multiplied by previous layer outputs) for each pixel at each neuron. (right) We extend this core idea of pixel-to-neuron mapping for the CNN-based INR, NeRV, by computing intermediate feature maps that are not yet summed on the input dimension. For layers prior to the head, we also account for the PixelShuffle and apply an aggregation filter to account for the subsequent layer's kernels, which propagate that neuron's contribution to neighboring pixels. To compute contributions for a given layer, we simply perform the shown steps in sequence for that layer and each subsequent layer.
  • Figure 3: The implicit neural canvas. We show representative example contribution maps for various layers of FFN tancik2020fourfeat and NeRV chen2021nerv. Notice how early layer FFN neurons manifest strong Fourier patterns, and how the last layers of NeRV tend to resemble the image, with NeRV head layer neurons being reminiscent of classical image processing filters.
  • Figure 4: Grouping contributions. We compute the variance of the difference from expected contribution for different groupings of contributions - Instances of objects and background, RGB color-based clusters, Gabor filter-based clusters and regular gridcells. These results suggest that INRs ignore space while preferring instances, color, and edge features.
  • Figure 5: Contribution vs. intensity. We compare contribution and intensity in alternating rows. In the top row, we sum all contribution maps for the indicated layer. In the next row, we show the difference between this, and the raw image intensity (sum of all color channels) to show when contribution does and does not correlate with intensity. The next two rows repeat this for a frame from another video.
  • ...and 14 more figures