Table of Contents
Fetching ...

Learned Initializations for Optimizing Coordinate-Based Neural Representations

Matthew Tancik, Ben Mildenhall, Terrance Wang, Divi Schmidt, Pratul P. Srinivasan, Jonathan T. Barron, Ren Ng

TL;DR

This work tackles the inefficiency of optimizing coordinate-based neural representations for each new signal by applying optimization-based meta-learning to learn a favorable initial weight $\theta_0^*$ for a given signal class. Using MAML or Reptile, the method learns a data-driven prior that enables faster test-time convergence and improved generalization when observations are limited. Empirical results across 2D image regression, CT reconstruction, and 3D view synthesis (ShapeNet and Phototourism) show substantial speedups and better reconstructions, including single-view 3D recovery and appearance transfer. The approach is simple to implement within existing test-time optimization pipelines and provides a principled way to inject class-specific priors into neural representations without changing network architecture."

Abstract

Coordinate-based neural representations have shown significant promise as an alternative to discrete, array-based representations for complex low dimensional signals. However, optimizing a coordinate-based network from randomly initialized weights for each new signal is inefficient. We propose applying standard meta-learning algorithms to learn the initial weight parameters for these fully-connected networks based on the underlying class of signals being represented (e.g., images of faces or 3D models of chairs). Despite requiring only a minor change in implementation, using these learned initial weights enables faster convergence during optimization and can serve as a strong prior over the signal class being modeled, resulting in better generalization when only partial observations of a given signal are available. We explore these benefits across a variety of tasks, including representing 2D images, reconstructing CT scans, and recovering 3D shapes and scenes from 2D image observations.

Learned Initializations for Optimizing Coordinate-Based Neural Representations

TL;DR

This work tackles the inefficiency of optimizing coordinate-based neural representations for each new signal by applying optimization-based meta-learning to learn a favorable initial weight for a given signal class. Using MAML or Reptile, the method learns a data-driven prior that enables faster test-time convergence and improved generalization when observations are limited. Empirical results across 2D image regression, CT reconstruction, and 3D view synthesis (ShapeNet and Phototourism) show substantial speedups and better reconstructions, including single-view 3D recovery and appearance transfer. The approach is simple to implement within existing test-time optimization pipelines and provides a principled way to inject class-specific priors into neural representations without changing network architecture."

Abstract

Coordinate-based neural representations have shown significant promise as an alternative to discrete, array-based representations for complex low dimensional signals. However, optimizing a coordinate-based network from randomly initialized weights for each new signal is inefficient. We propose applying standard meta-learning algorithms to learn the initial weight parameters for these fully-connected networks based on the underlying class of signals being represented (e.g., images of faces or 3D models of chairs). Despite requiring only a minor change in implementation, using these learned initial weights enables faster convergence during optimization and can serve as a strong prior over the signal class being modeled, resulting in better generalization when only partial observations of a given signal are available. We explore these benefits across a variety of tasks, including representing 2D images, reconstructing CT scans, and recovering 3D shapes and scenes from 2D image observations.

Paper Structure

This paper contains 32 sections, 7 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: A coordinate-based MLP, illustrated on the left, takes a coordinate as input and outputs a value at that location. For example, the network could take in a pixel coordinate $(x,y)$ and emit the $(R, G, B)$ color at that pixel as output, thereby representing a 2D image. The network weights $\theta$ are typically optimized via gradient descent to produce the desired image, as depicted on the right. However, finding good parameters can be computationally expensive, and the full optimization process must be repeated for each new target. We propose using meta-learning to find initial network weights $\theta_0^*$ that allow for faster convergence and better generalization.
  • Figure 2: Faster convergence: Examples of optimizing a network to represent a 2D image from different initial weight settings. The meta-learned initialization (Meta) is specialized for the class of human face images but still helps speed up convergence on other natural images (right). Non-meta-initialized networks take $10$ to $20$ times as many iterations to reach the same quality as the meta-initialized network does after only 2 gradient steps (see Table \ref{['tab:2d_image_mem']}).
  • Figure 3: Sparse Recovery: Examples of CT reconstructions of a Shepp-Logan phantom from a sparse set of views. The meta-learned initial weights encode a data-dependent prior that improves reconstruction in the limited data regime.
  • Figure 4: Single view reconstructions of ShapeNet shapenet2015 objects. The simple-NeRF formulation relies on multi-view consistency for supervision and therefore fails if naively applied to the task of single view reconstruction, as seen in the Standard column. However, if the model is trained starting from meta-learned initial weights, it is able to recover 3D geometry. The MV Meta initialization has access to multiple views per object during meta-learning, whereas the SV Meta initialization only has access to a single view per object during meta-learning. All methods only receive a single input view during test-time optimization.
  • Figure 5: Reconstruction quality over the course of training for models optimized to reconstruct ShapeNet chairs from a set of 25 reference images. The model starting from the meta-learned initial weights outperforms the network using a standard random initialization throughout training.
  • ...and 3 more figures