Table of Contents
Fetching ...

Canonical Factors for Hybrid Neural Fields

Brent Yi, Weijia Zeng, Sam Buchanan, Yi Ma

TL;DR

This work tackles axis-aligned biases in factored feature volumes used by hybrid neural fields. It introduces TILTED, a transform-invariant latent decomposition that learns canonical factors together with domain transformations, coupled with coarse-to-fine optimization to handle high-frequency content. Theoretical results in a 2D model justify joint learning of alignment and representation, and extensive experiments across 2D images, SDFs, and NeRFs demonstrate improved quality, robustness, and memory/runtime efficiency, including real-world scene gains such as halved parameter counts and a 25% faster training time. The findings also reveal evaluation biases in radiance-field pipelines and suggest broader applicability to more transformations and overparameterized settings.

Abstract

Factored feature volumes offer a simple way to build more compact, efficient, and intepretable neural fields, but also introduce biases that are not necessarily beneficial for real-world data. In this work, we (1) characterize the undesirable biases that these architectures have for axis-aligned signals -- they can lead to radiance field reconstruction differences of as high as 2 PSNR -- and (2) explore how learning a set of canonicalizing transformations can improve representations by removing these biases. We prove in a two-dimensional model problem that simultaneously learning these transformations together with scene appearance succeeds with drastically improved efficiency. We validate the resulting architectures, which we call TILTED, using image, signed distance, and radiance field reconstruction tasks, where we observe improvements across quality, robustness, compactness, and runtime. Results demonstrate that TILTED can enable capabilities comparable to baselines that are 2x larger, while highlighting weaknesses of neural field evaluation procedures.

Canonical Factors for Hybrid Neural Fields

TL;DR

This work tackles axis-aligned biases in factored feature volumes used by hybrid neural fields. It introduces TILTED, a transform-invariant latent decomposition that learns canonical factors together with domain transformations, coupled with coarse-to-fine optimization to handle high-frequency content. Theoretical results in a 2D model justify joint learning of alignment and representation, and extensive experiments across 2D images, SDFs, and NeRFs demonstrate improved quality, robustness, and memory/runtime efficiency, including real-world scene gains such as halved parameter counts and a 25% faster training time. The findings also reveal evaluation biases in radiance-field pipelines and suggest broader applicability to more transformations and overparameterized settings.

Abstract

Factored feature volumes offer a simple way to build more compact, efficient, and intepretable neural fields, but also introduce biases that are not necessarily beneficial for real-world data. In this work, we (1) characterize the undesirable biases that these architectures have for axis-aligned signals -- they can lead to radiance field reconstruction differences of as high as 2 PSNR -- and (2) explore how learning a set of canonicalizing transformations can improve representations by removing these biases. We prove in a two-dimensional model problem that simultaneously learning these transformations together with scene appearance succeeds with drastically improved efficiency. We validate the resulting architectures, which we call TILTED, using image, signed distance, and radiance field reconstruction tasks, where we observe improvements across quality, robustness, compactness, and runtime. Results demonstrate that TILTED can enable capabilities comparable to baselines that are 2x larger, while highlighting weaknesses of neural field evaluation procedures.
Paper Structure (60 sections, 23 theorems, 483 equations, 8 figures, 11 tables)

This paper contains 60 sections, 23 theorems, 483 equations, 8 figures, 11 tables.

Key Result

Theorem 1

There exist absolute constants $c_0, c_1 > 0$ such that for any target channel count $F \leq c_0 n^{1/9.5}$, every rank-$F$ approximation $\hat{\bm{X}}$ to $\bm{X}_{\pi/4}$ satisfies

Figures (8)

  • Figure 1: Learned transforms for factored feature volumes. Latent decompositions with fixed, axis-aligned projections (left) introduce biases for axis-aligned signals. A more robust, transform-invariant latent decomposition (TILTED) is obtained by treating projections to feature grids as learnable functions, here parameterized by $\bm{\tau}_t$.
  • Figure 2: Tensor decompositions for 3D features volumes studied by prior workchen2022tensorfchan2021eg3dchen2022tensorffridovich2023kplanes. Note that all assume a fixed, axis-aligned structure; TILTED instead proposes to learn transformations of this structure.
  • Figure 3: Limitations of low-rank feature grids.(a): The square template $\bm{X}_{\natural}$ is axis-aligned, and has a maximally-compact (rank one) representation. (b): After a rotation by $\pi/4$ radians, the square template (in red) only changes its orientation, but its approximability by a low-rank grid deteriorates dramatically. We draw the scaled eigenvectors and approximation for $F=3$. (c): By optimizing over transformations, a rank-one grid can be used to represent all rotations of $\bm{X}_{\natural}$. (d): We plot the number of components needed to achieve varying PSNR levels as a function of image resolution for $\nu = \pi/4$. The number of components is always significantly larger than is necessary when transform optimization is used.
  • Figure 4: Two-phase optimization. Two TILTED neural fields are trained: the first using a rank-constrained bottleneck representation (left); all parameters are discarded except for the projection parameters $\bm{\tau}_\text{bneck}$, which are used for initialization of the final representation (right).
  • Figure 5: Evaluation images and results for 2D image reconstruction. We apply rotations to each input image, and plot holdout PSNR for a model trained at each angle. Axis-aligned feature decompositions are sensitive to transformations of the input, while TILTED retains a constant PSNR across angles.
  • ...and 3 more figures

Theorems & Definitions (48)

  • Theorem 1: informal version of \ref{['thm:inapproximability']}
  • Theorem 2: informal version of \ref{['thm:tilt-infinite']}
  • Theorem D.1
  • proof
  • Remark D.1
  • Remark D.2
  • Lemma D.1
  • proof
  • Lemma D.2
  • proof
  • ...and 38 more