Table of Contents
Fetching ...

Neural Processing of Tri-Plane Hybrid Neural Fields

Adriano Cardace, Pierluigi Zama Ramirez, Francesco Ballerini, Allan Zhou, Samuele Salti, Luigi Di Stefano

TL;DR

This work tackles the challenge of directly processing neural fields for 3D tasks by exploiting tri-plane hybrid neural fields, where a compact discrete feature map $T=({m{F}}_{xy},{m{F}}_{xz},{m{F}}_{yz})}$ and a small MLP $M$ jointly represent a field. By processing only the discrete tri-plane features with Transformer-based architectures that are invariant to channel order, the approach achieves reconstruction-quality neural fields while delivering state-of-the-art or near-explicit-representation performance on tasks like classification and 3D part segmentation across SDF, UDF, OF, and RF. A universal tri-plane classifier demonstrates cross-field generalization, and extensive ablations show the benefits of permutation-invariant processing and the superiority of Transformer-based tri-plane processing over MLP or CNN baselines. These results establish a practical, memory-efficient path for storing and analyzing 3D data via neural fields, including NeRF-type signals that can be classified without rendering images.

Abstract

Driven by the appealing properties of neural fields for storing and communicating 3D data, the problem of directly processing them to address tasks such as classification and part segmentation has emerged and has been investigated in recent works. Early approaches employ neural fields parameterized by shared networks trained on the whole dataset, achieving good task performance but sacrificing reconstruction quality. To improve the latter, later methods focus on individual neural fields parameterized as large Multi-Layer Perceptrons (MLPs), which are, however, challenging to process due to the high dimensionality of the weight space, intrinsic weight space symmetries, and sensitivity to random initialization. Hence, results turn out significantly inferior to those achieved by processing explicit representations, e.g., point clouds or meshes. In the meantime, hybrid representations, in particular based on tri-planes, have emerged as a more effective and efficient alternative to realize neural fields, but their direct processing has not been investigated yet. In this paper, we show that the tri-plane discrete data structure encodes rich information, which can be effectively processed by standard deep-learning machinery. We define an extensive benchmark covering a diverse set of fields such as occupancy, signed/unsigned distance, and, for the first time, radiance fields. While processing a field with the same reconstruction quality, we achieve task performance far superior to frameworks that process large MLPs and, for the first time, almost on par with architectures handling explicit representations.

Neural Processing of Tri-Plane Hybrid Neural Fields

TL;DR

This work tackles the challenge of directly processing neural fields for 3D tasks by exploiting tri-plane hybrid neural fields, where a compact discrete feature map and a small MLP jointly represent a field. By processing only the discrete tri-plane features with Transformer-based architectures that are invariant to channel order, the approach achieves reconstruction-quality neural fields while delivering state-of-the-art or near-explicit-representation performance on tasks like classification and 3D part segmentation across SDF, UDF, OF, and RF. A universal tri-plane classifier demonstrates cross-field generalization, and extensive ablations show the benefits of permutation-invariant processing and the superiority of Transformer-based tri-plane processing over MLP or CNN baselines. These results establish a practical, memory-efficient path for storing and analyzing 3D data via neural fields, including NeRF-type signals that can be classified without rendering images.

Abstract

Driven by the appealing properties of neural fields for storing and communicating 3D data, the problem of directly processing them to address tasks such as classification and part segmentation has emerged and has been investigated in recent works. Early approaches employ neural fields parameterized by shared networks trained on the whole dataset, achieving good task performance but sacrificing reconstruction quality. To improve the latter, later methods focus on individual neural fields parameterized as large Multi-Layer Perceptrons (MLPs), which are, however, challenging to process due to the high dimensionality of the weight space, intrinsic weight space symmetries, and sensitivity to random initialization. Hence, results turn out significantly inferior to those achieved by processing explicit representations, e.g., point clouds or meshes. In the meantime, hybrid representations, in particular based on tri-planes, have emerged as a more effective and efficient alternative to realize neural fields, but their direct processing has not been investigated yet. In this paper, we show that the tri-plane discrete data structure encodes rich information, which can be effectively processed by standard deep-learning machinery. We define an extensive benchmark covering a diverse set of fields such as occupancy, signed/unsigned distance, and, for the first time, radiance fields. While processing a field with the same reconstruction quality, we achieve task performance far superior to frameworks that process large MLPs and, for the first time, almost on par with architectures handling explicit representations.
Paper Structure (33 sections, 4 equations, 12 figures, 15 tables)

This paper contains 33 sections, 4 equations, 12 figures, 15 tables.

Figures (12)

  • Figure 1: Left: Neural processing of hybrid neural fields allows us to employ well-established architectures to tackle deep learning tasks while avoiding problems related to processing MLPs, such as the high-dimensional weight space and the random initialization. Right: We achieve performance better than other works on this topic, close to methods that operate directly on explicit representations. without sacrificing the reconstruction quality of neural fields.
  • Figure 2: Left: Tri-plane representation and learning of each neural field. Right: Datasets are composed of many independent tri-plane hybrid neural fields, each representing a 3D object.
  • Figure 3: Left: For three different hybrid neural fields (from top to bottom: SDF, UDF, RF) we render a view of the reconstructed 3D object alongside the corresponding tri-plane feature map. Right: from left to right, reconstructions of two $(\text{tri-plane},\text{MLP})$ pairs with different initialization, namely $(T_A, M_A)$ and $(T_B, M_B)$; the mixed pair $(T_A, M_B)$; a channel permutation of $T_A$ and $M_B$.
  • Figure 4: Tri-plane reconstruction examples of point clouds from UDF, meshes from SDF, voxels from OF, and images from RF (from top to bottom)
  • Figure 5: Reconstruction comparison for Manifold40 meshes obtained from SDF
  • ...and 7 more figures