Neural Processing of Tri-Plane Hybrid Neural Fields
Adriano Cardace, Pierluigi Zama Ramirez, Francesco Ballerini, Allan Zhou, Samuele Salti, Luigi Di Stefano
TL;DR
This work tackles the challenge of directly processing neural fields for 3D tasks by exploiting tri-plane hybrid neural fields, where a compact discrete feature map $T=({m{F}}_{xy},{m{F}}_{xz},{m{F}}_{yz})}$ and a small MLP $M$ jointly represent a field. By processing only the discrete tri-plane features with Transformer-based architectures that are invariant to channel order, the approach achieves reconstruction-quality neural fields while delivering state-of-the-art or near-explicit-representation performance on tasks like classification and 3D part segmentation across SDF, UDF, OF, and RF. A universal tri-plane classifier demonstrates cross-field generalization, and extensive ablations show the benefits of permutation-invariant processing and the superiority of Transformer-based tri-plane processing over MLP or CNN baselines. These results establish a practical, memory-efficient path for storing and analyzing 3D data via neural fields, including NeRF-type signals that can be classified without rendering images.
Abstract
Driven by the appealing properties of neural fields for storing and communicating 3D data, the problem of directly processing them to address tasks such as classification and part segmentation has emerged and has been investigated in recent works. Early approaches employ neural fields parameterized by shared networks trained on the whole dataset, achieving good task performance but sacrificing reconstruction quality. To improve the latter, later methods focus on individual neural fields parameterized as large Multi-Layer Perceptrons (MLPs), which are, however, challenging to process due to the high dimensionality of the weight space, intrinsic weight space symmetries, and sensitivity to random initialization. Hence, results turn out significantly inferior to those achieved by processing explicit representations, e.g., point clouds or meshes. In the meantime, hybrid representations, in particular based on tri-planes, have emerged as a more effective and efficient alternative to realize neural fields, but their direct processing has not been investigated yet. In this paper, we show that the tri-plane discrete data structure encodes rich information, which can be effectively processed by standard deep-learning machinery. We define an extensive benchmark covering a diverse set of fields such as occupancy, signed/unsigned distance, and, for the first time, radiance fields. While processing a field with the same reconstruction quality, we achieve task performance far superior to frameworks that process large MLPs and, for the first time, almost on par with architectures handling explicit representations.
