Exploring 3D-aware Latent Spaces for Efficiently Learning Numerous Scenes
Antoine Schnepf, Karim Kassab, Jean-Yves Franceschi, Laurent Caraffa, Flavian Vasile, Jeremie Mary, Andrew Comport, Valérie Gouet-Brunet
TL;DR
The paper tackles scaling neural scene representations to learn a large atlas of similar scenes by introducing a 3D-aware latent space (3Da-AE) and cross-scene information sharing. It combines Encode-Scene, Decode-Scene, and Encode-Decode-Scene strategies with a Tri-Plane representation and a Micro-Macro decomposition to drastically reduce per-scene memory and training time while maintaining rendering quality. The two-stage approach first trains a 3D-aware autoencoder to shape the latent space and then exploits it to efficiently learn thousands of scenes, achieving up to 86% faster training and 44% less memory per scene for 1000 scenes, with PSNR comparable to RGB-based Tri-Planes and a 53% reduction in rendering time. The work offers a practical pathway toward a foundation 3D-aware latent space for scalable 3D scene learning and rendering.
Abstract
We present a method enabling the scaling of NeRFs to learn a large number of semantically-similar scenes. We combine two techniques to improve the required training time and memory cost per scene. First, we learn a 3D-aware latent space in which we train Tri-Plane scene representations, hence reducing the resolution at which scenes are learned. Moreover, we present a way to share common information across scenes, hence allowing for a reduction of model complexity to learn a particular scene. Our method reduces effective per-scene memory costs by 44% and per-scene time costs by 86% when training 1000 scenes. Our project page can be found at https://3da-ae.github.io .
