Canonical Factors for Hybrid Neural Fields

Brent Yi; Weijia Zeng; Sam Buchanan; Yi Ma

Canonical Factors for Hybrid Neural Fields

Brent Yi, Weijia Zeng, Sam Buchanan, Yi Ma

TL;DR

This work tackles axis-aligned biases in factored feature volumes used by hybrid neural fields. It introduces TILTED, a transform-invariant latent decomposition that learns canonical factors together with domain transformations, coupled with coarse-to-fine optimization to handle high-frequency content. Theoretical results in a 2D model justify joint learning of alignment and representation, and extensive experiments across 2D images, SDFs, and NeRFs demonstrate improved quality, robustness, and memory/runtime efficiency, including real-world scene gains such as halved parameter counts and a 25% faster training time. The findings also reveal evaluation biases in radiance-field pipelines and suggest broader applicability to more transformations and overparameterized settings.

Abstract

Factored feature volumes offer a simple way to build more compact, efficient, and intepretable neural fields, but also introduce biases that are not necessarily beneficial for real-world data. In this work, we (1) characterize the undesirable biases that these architectures have for axis-aligned signals -- they can lead to radiance field reconstruction differences of as high as 2 PSNR -- and (2) explore how learning a set of canonicalizing transformations can improve representations by removing these biases. We prove in a two-dimensional model problem that simultaneously learning these transformations together with scene appearance succeeds with drastically improved efficiency. We validate the resulting architectures, which we call TILTED, using image, signed distance, and radiance field reconstruction tasks, where we observe improvements across quality, robustness, compactness, and runtime. Results demonstrate that TILTED can enable capabilities comparable to baselines that are 2x larger, while highlighting weaknesses of neural field evaluation procedures.

Canonical Factors for Hybrid Neural Fields

TL;DR

Abstract

Paper Structure (60 sections, 23 theorems, 483 equations, 8 figures, 11 tables)

This paper contains 60 sections, 23 theorems, 483 equations, 8 figures, 11 tables.

Introduction
Related Work
Neural Fields
Hybrid Neural Fields
Learning With Transformations of Domain
Low-Rank Grids Are Delicate Creatures
TILTED
Applying Transformations
Coarse-to-Fine Optimization
Experiments
2D Image Reconstruction
Signed Distance Field Reconstruction
Neural Radiance Fields
Synthetic Study
Real-World Study
...and 45 more sections

Key Result

Theorem 1

There exist absolute constants $c_0, c_1 > 0$ such that for any target channel count $F \leq c_0 n^{1/9.5}$, every rank-$F$ approximation $\hat{\bm{X}}$ to $\bm{X}_{\pi/4}$ satisfies

Figures (8)

Figure 1: Learned transforms for factored feature volumes. Latent decompositions with fixed, axis-aligned projections (left) introduce biases for axis-aligned signals. A more robust, transform-invariant latent decomposition (TILTED) is obtained by treating projections to feature grids as learnable functions, here parameterized by $\bm{\tau}_t$.
Figure 2: Tensor decompositions for 3D features volumes studied by prior workchen2022tensorfchan2021eg3dchen2022tensorffridovich2023kplanes. Note that all assume a fixed, axis-aligned structure; TILTED instead proposes to learn transformations of this structure.
Figure 3: Limitations of low-rank feature grids.(a): The square template $\bm{X}_{\natural}$ is axis-aligned, and has a maximally-compact (rank one) representation. (b): After a rotation by $\pi/4$ radians, the square template (in red) only changes its orientation, but its approximability by a low-rank grid deteriorates dramatically. We draw the scaled eigenvectors and approximation for $F=3$. (c): By optimizing over transformations, a rank-one grid can be used to represent all rotations of $\bm{X}_{\natural}$. (d): We plot the number of components needed to achieve varying PSNR levels as a function of image resolution for $\nu = \pi/4$. The number of components is always significantly larger than is necessary when transform optimization is used.
Figure 4: Two-phase optimization. Two TILTED neural fields are trained: the first using a rank-constrained bottleneck representation (left); all parameters are discarded except for the projection parameters $\bm{\tau}_\text{bneck}$, which are used for initialization of the final representation (right).
Figure 5: Evaluation images and results for 2D image reconstruction. We apply rotations to each input image, and plot holdout PSNR for a model trained at each angle. Axis-aligned feature decompositions are sensitive to transformations of the input, while TILTED retains a constant PSNR across angles.
...and 3 more figures

Theorems & Definitions (48)

Theorem 1: informal version of \ref{['thm:inapproximability']}
Theorem 2: informal version of \ref{['thm:tilt-infinite']}
Theorem D.1
proof
Remark D.1
Remark D.2
Lemma D.1
proof
Lemma D.2
proof
...and 38 more

Canonical Factors for Hybrid Neural Fields

TL;DR

Abstract

Canonical Factors for Hybrid Neural Fields

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (48)