ResFields: Residual Neural Fields for Spatiotemporal Signals
Marko Mihajlovic, Sergey Prokudin, Marc Pollefeys, Siyu Tang
TL;DR
ResFields tackles the capacity bottleneck of MLP-based neural fields for complex spatiotemporal signals by inserting time-conditioned residual layers into layer weights and factorizing these residuals with a low-rank scheme. The method increases expressive power without widening the base MLP, preserves implicit regularization, and remains broadly compatible with existing neural-field architectures. Across 2D video, temporal SDFs, dynamic NeRF, and scene-flow tasks, ResFields yield consistent gains in reconstruction quality and efficiency, including faster training and reduced memory usage. The work demonstrates strong generalization and practical promise for modeling dynamic scenes from sparse data, with open-source resources to support reproducibility and further development.
Abstract
Neural fields, a category of neural networks trained to represent high-frequency signals, have gained significant attention in recent years due to their impressive performance in modeling complex 3D data, such as signed distance (SDFs) or radiance fields (NeRFs), via a single multi-layer perceptron (MLP). However, despite the power and simplicity of representing signals with an MLP, these methods still face challenges when modeling large and complex temporal signals due to the limited capacity of MLPs. In this paper, we propose an effective approach to address this limitation by incorporating temporal residual layers into neural fields, dubbed ResFields. It is a novel class of networks specifically designed to effectively represent complex temporal signals. We conduct a comprehensive analysis of the properties of ResFields and propose a matrix factorization technique to reduce the number of trainable parameters and enhance generalization capabilities. Importantly, our formulation seamlessly integrates with existing MLP-based neural fields and consistently improves results across various challenging tasks: 2D video approximation, dynamic shape modeling via temporal SDFs, and dynamic NeRF reconstruction. Lastly, we demonstrate the practical utility of ResFields by showcasing its effectiveness in capturing dynamic 3D scenes from sparse RGBD cameras of a lightweight capture system.
