Weight Space Representation Learning on Diverse NeRF Architectures
Francesco Ballerini, Pierluigi Zama Ramirez, Luigi Di Stefano, Samuele Salti
TL;DR
The paper tackles the challenge of applying downstream tasks to NeRF representations that come in many architectures by using a Graph Meta-Network to embed NeRF parameter graphs into a common latent space. It couples a rendering-based objective with a SigLIP contrastive loss to produce embeddings that reflect object content rather than architectural encoding, enabling robust classification, retrieval, and language tasks across MLP, tri-plane, and hash-table NeRFs, including unseen architectures. The approach achieves competitive or superior performance compared to single-architecture baselines and demonstrates generalization to new datasets (Objaverse) and multi-modal language tasks, suggesting a path toward a foundational NeRF weight-space model. Limitations include evaluation on a single primary dataset (ShapeNetRender) and planned expansion to larger-scale NeRF collections. Overall, the work offers a scalable, architecture-agnostic paradigm for NeRF weight space processing with broad downstream applicability.
Abstract
Neural Radiance Fields (NeRFs) have emerged as a groundbreaking paradigm for representing 3D objects and scenes by encoding shape and appearance information into the weights of a neural network. Recent studies have demonstrated that these weights can be used as input for frameworks designed to address deep learning tasks; however, such frameworks require NeRFs to adhere to a specific, predefined architecture. In this paper, we introduce the first framework capable of processing NeRFs with diverse architectures and performing inference on architectures unseen at training time. We achieve this by training a Graph Meta-Network within an unsupervised representation learning framework, and show that a contrastive objective is conducive to obtaining an architecture-agnostic latent space. In experiments conducted across 13 NeRF architectures belonging to three families (MLPs, tri-planes, and, for the first time, hash tables), our approach demonstrates robust performance in classification, retrieval, and language tasks involving multiple architectures, even unseen at training time, while also matching or exceeding the results of existing frameworks limited to single architectures.
