Table of Contents
Fetching ...

How to Train Neural Field Representations: A Comprehensive Study and Benchmark

Samuele Papa, Riccardo Valperga, David Knigge, Miltiadis Kofinas, Phillip Lippe, Jan-Jakob Sonke, Efstratios Gavves

TL;DR

The paper addresses how neural-field representations (NeFs) can serve as effective data representations and why fitting NeFs at scale has been a bottleneck. It introduces Fit-a-NeF, a JAX-based library that enables fast, parallel fitting of millions of NeFs, enabling a comprehensive study of how NeF hyperparameters affect downstream tasks, such as classification. Key findings show that constraining NeFs to be close in parameter space via shared initialization improves downstream performance, and that reconstruction quality does not necessarily translate to better representations; overtraining and architectural expressivity can harm downstream accuracy. To promote standardized research, the authors propose Neural Field Arena, a benchmark suite with NeF variants of standard vision datasets and provide open-source tooling for the community. The work advances practical guidelines for training NeFs and lays the groundwork for systematic benchmarking in neural-field representations.

Abstract

Neural fields (NeFs) have recently emerged as a versatile method for modeling signals of various modalities, including images, shapes, and scenes. Subsequently, a number of works have explored the use of NeFs as representations for downstream tasks, e.g. classifying an image based on the parameters of a NeF that has been fit to it. However, the impact of the NeF hyperparameters on their quality as downstream representation is scarcely understood and remains largely unexplored. This is in part caused by the large amount of time required to fit datasets of neural fields. In this work, we propose a JAX-based library that leverages parallelization to enable fast optimization of large-scale NeF datasets, resulting in a significant speed-up. With this library, we perform a comprehensive study that investigates the effects of different hyperparameters on fitting NeFs for downstream tasks. In particular, we explore the use of a shared initialization, the effects of overtraining, and the expressiveness of the network architectures used. Our study provides valuable insights on how to train NeFs and offers guidance for optimizing their effectiveness in downstream applications. Finally, based on the proposed library and our analysis, we propose Neural Field Arena, a benchmark consisting of neural field variants of popular vision datasets, including MNIST, CIFAR, variants of ImageNet, and ShapeNetv2. Our library and the Neural Field Arena will be open-sourced to introduce standardized benchmarking and promote further research on neural fields.

How to Train Neural Field Representations: A Comprehensive Study and Benchmark

TL;DR

The paper addresses how neural-field representations (NeFs) can serve as effective data representations and why fitting NeFs at scale has been a bottleneck. It introduces Fit-a-NeF, a JAX-based library that enables fast, parallel fitting of millions of NeFs, enabling a comprehensive study of how NeF hyperparameters affect downstream tasks, such as classification. Key findings show that constraining NeFs to be close in parameter space via shared initialization improves downstream performance, and that reconstruction quality does not necessarily translate to better representations; overtraining and architectural expressivity can harm downstream accuracy. To promote standardized research, the authors propose Neural Field Arena, a benchmark suite with NeF variants of standard vision datasets and provide open-source tooling for the community. The work advances practical guidelines for training NeFs and lays the groundwork for systematic benchmarking in neural-field representations.

Abstract

Neural fields (NeFs) have recently emerged as a versatile method for modeling signals of various modalities, including images, shapes, and scenes. Subsequently, a number of works have explored the use of NeFs as representations for downstream tasks, e.g. classifying an image based on the parameters of a NeF that has been fit to it. However, the impact of the NeF hyperparameters on their quality as downstream representation is scarcely understood and remains largely unexplored. This is in part caused by the large amount of time required to fit datasets of neural fields. In this work, we propose a JAX-based library that leverages parallelization to enable fast optimization of large-scale NeF datasets, resulting in a significant speed-up. With this library, we perform a comprehensive study that investigates the effects of different hyperparameters on fitting NeFs for downstream tasks. In particular, we explore the use of a shared initialization, the effects of overtraining, and the expressiveness of the network architectures used. Our study provides valuable insights on how to train NeFs and offers guidance for optimizing their effectiveness in downstream applications. Finally, based on the proposed library and our analysis, we propose Neural Field Arena, a benchmark consisting of neural field variants of popular vision datasets, including MNIST, CIFAR, variants of ImageNet, and ShapeNetv2. Our library and the Neural Field Arena will be open-sourced to introduce standardized benchmarking and promote further research on neural fields.
Paper Structure (42 sections, 6 equations, 12 figures, 7 tables)

This paper contains 42 sections, 6 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: To investigate the use of Neural Fields (NeFs) as representations, we propose Fit-a-NeF (\ref{['fig:fig1a-fitting']}), a JAX-based library for efficient parallelized fitting of NeFs to datasets of signals, obtaining a $100{-}1,300\times$ speed-up. Owing to this efficiency, we are able to uncover the impact of NeF hyperparameter choices on their usability as representations -- evaluated in downstream classification (Fig. \ref{['fig:fig1b-studies']}) -- obtaining two important insights. First, \ref{['fig:fig1b-studies']}-left, it is vital to group NeFs in parameter-space, which we propose to enforce by sharing their network initializations. Second, \ref{['fig:fig1b-studies']}-right, improved reconstruction quality does not necessarily result in improved representation quality, implying an optimal combination of NeF expressivity and optimization for learning on NeFs. Incorporating these insights, we create a suite of NeF-based variants of classical CV datasets, Fig. \ref{['fig:fig1c-benchmark']}. We bundle these Neural Datasets into a benchmark for learning on Neural Fields -- which we name Neural Field Arena -- hoping to enable standardized comparison in order to promote further research into this field.
  • Figure 2: From left to right, samples from neural datasets with increasing reconstruction quality. The right-most column shows the ground truth used for fitting.
  • Figure 3: Speedup obtained using Fit-a-NeF over a naive sequential approach using SIRENs using different hidden dimensions and number of layers. The evaluation was performed using 30k samples from MNIST on an A100 GPU. Smaller networks show the biggest speedup, as the parallelization is more effective.
  • Figure 4: The histograms show the distribution of pairwise distances of the NeF representations. We fit 10 ShapeNet-10 Neural Datasets varying initialization and total number of steps. Shared initialization produces more grouped representations, and pairwise distances increase with the number of steps.
  • Figure 5: Results of the test accuracy ($\uparrow$) vs NMI ($\uparrow$) using different initialization on 220 Neural Datasets created using different hidden dimensions and the number of steps. Different datasets are stylized using different markers. Shared initialization leads to semantically structured NeF representation and, generally to better performance. The NMI of CIFAR10 and MicroImageNet are lower than those of ShapeNet and MNIST, however, are still clearly separated from their random initialization counterpart.
  • ...and 7 more figures