ReFiNe: Recursive Field Networks for Cross-modal Multi-scene Representation
Sergey Zakharov, Katherine Liu, Adrien Gaidon, Rares Ambrus
TL;DR
ReFiNe introduces a Recursive Field Network that encodes multiple 3D assets as continuous implicit fields within a single lightweight network by recursively expanding a per object latent through an octree with pruning. The method unifies global and local conditioning via multiscale feature fusion and decodes into flexible outputs (SDF, SDF+RGB, NeRF) suitable for ray tracing and differentiable rendering. Across Thingi32, ShapeNet150, SRN Cars, GSO, and RTMV, ReFiNe achieves high fidelity with dramatically reduced memory usage, enabling scalable multi object representations and cross modal rendering within a single network per dataset. The approach yields compact models that retain high frequency geometry and texture details, enabling practical compression and rendering applications while highlighting a coherent latent space structure and smooth latent interpolation.
Abstract
The common trade-offs of state-of-the-art methods for multi-shape representation (a single model "packing" multiple objects) involve trading modeling accuracy against memory and storage. We show how to encode multiple shapes represented as continuous neural fields with a higher degree of precision than previously possible and with low memory usage. Key to our approach is a recursive hierarchical formulation that exploits object self-similarity, leading to a highly compressed and efficient shape latent space. Thanks to the recursive formulation, our method supports spatial and global-to-local latent feature fusion without needing to initialize and maintain auxiliary data structures, while still allowing for continuous field queries to enable applications such as raytracing. In experiments on a set of diverse datasets, we provide compelling qualitative results and demonstrate state-of-the-art multi-scene reconstruction and compression results with a single network per dataset.
