Table of Contents
Fetching ...

Semantic Is Enough: Only Semantic Information For NeRF Reconstruction

Ruibo Wang, Song Zhang, Ping Huang, Donghai Zhang, Wei Yan

TL;DR

This research aims to extend the Semantic Neural Radiance Fields (Semantic-NeRF) model by focusing solely on semantic output and removing the RGB output component, and offers valuable insights into the new way of rendering the scenes.

Abstract

Recent research that combines implicit 3D representation with semantic information, like Semantic-NeRF, has proven that NeRF model could perform excellently in rendering 3D structures with semantic labels. This research aims to extend the Semantic Neural Radiance Fields (Semantic-NeRF) model by focusing solely on semantic output and removing the RGB output component. We reformulate the model and its training procedure to leverage only the cross-entropy loss between the model semantic output and the ground truth semantic images, removing the colour data traditionally used in the original Semantic-NeRF approach. We then conduct a series of identical experiments using the original and the modified Semantic-NeRF model. Our primary objective is to obverse the impact of this modification on the model performance by Semantic-NeRF, focusing on tasks such as scene understanding, object detection, and segmentation. The results offer valuable insights into the new way of rendering the scenes and provide an avenue for further research and development in semantic-focused 3D scene understanding.

Semantic Is Enough: Only Semantic Information For NeRF Reconstruction

TL;DR

This research aims to extend the Semantic Neural Radiance Fields (Semantic-NeRF) model by focusing solely on semantic output and removing the RGB output component, and offers valuable insights into the new way of rendering the scenes.

Abstract

Recent research that combines implicit 3D representation with semantic information, like Semantic-NeRF, has proven that NeRF model could perform excellently in rendering 3D structures with semantic labels. This research aims to extend the Semantic Neural Radiance Fields (Semantic-NeRF) model by focusing solely on semantic output and removing the RGB output component. We reformulate the model and its training procedure to leverage only the cross-entropy loss between the model semantic output and the ground truth semantic images, removing the colour data traditionally used in the original Semantic-NeRF approach. We then conduct a series of identical experiments using the original and the modified Semantic-NeRF model. Our primary objective is to obverse the impact of this modification on the model performance by Semantic-NeRF, focusing on tasks such as scene understanding, object detection, and segmentation. The results offer valuable insights into the new way of rendering the scenes and provide an avenue for further research and development in semantic-focused 3D scene understanding.
Paper Structure (16 sections, 6 equations, 7 figures, 5 tables)

This paper contains 16 sections, 6 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: The network architecture of our method. This network uses the Positional encoding of the world coordinates X = (x,y,z) as the input to our network, and in the $5^{th}$ blue layer, it will concatenate the input into the network. Each blue layer uses 256 channels except the last one, which uses 128 channels, and the output layer uses the channel as same as the number of semantic labels. Then, our network outputs the volume density $\sigma$ in the yellow block and the semantic logit in the brown block.
  • Figure 2: In our training step, we set our loss function to ignore those unnecessary labels like the black part in the red circles in the training data sets so that in both the semantic nerf method and our method, it will fill it with the nearby semantic labels, and the red arrows show the difference by ignoring those black parts.
  • Figure 3: Qualitative comparison of semantic view synthesis. Our semantic-only method results compared to the Semantic NeRF in rendering a novel view of semantic information, it shows almost no difference between those results, and we can focus on those red rectangles, which show some specific items, our method performs similar rendering quality as Semantic NeRF.
  • Figure 4: Qualitative comparison of the sparse label. When we use only 10% of semantic images as the training datasets, our method also shows great power in using a small number of datasets to render the semantic maps as the Semantic NeRF.
  • Figure 5: Qualitative comparison of the Pixel-wise noise. From different levels of pixel corruption, we can see that with the 50% noise labels in the first row, our methods can denoise those labels to their original forms as the Semantic NeRF. Even in the 90% noise labels in the second row, our method also performs a similar output as the Semantic NeRF.
  • ...and 2 more figures