Table of Contents
Fetching ...

Towards Balanced RGB-TSDF Fusion for Consistent Semantic Scene Completion by 3D RGB Feature Completion and a Classwise Entropy Loss Function

Laiyan Ding, Panwen Hu, Jie Li, Rui Huang

TL;DR

A two-stage network with a 3D RGB feature completion module that completes RGB features with meaningful values for occluded areas and an effective classwise entropy loss function to punish inconsistency is proposed.

Abstract

Semantic Scene Completion (SSC) aims to jointly infer semantics and occupancies of 3D scenes. Truncated Signed Distance Function (TSDF), a 3D encoding of depth, has been a common input for SSC. Furthermore, RGB-TSDF fusion, seems promising since these two modalities provide color and geometry information, respectively. Nevertheless, RGB-TSDF fusion has been considered nontrivial and commonly-used naive addition will result in inconsistent results. We argue that the inconsistency comes from the sparsity of RGB features upon projecting into 3D space, while TSDF features are dense, leading to imbalanced feature maps when summed up. To address this RGB-TSDF distribution difference, we propose a two-stage network with a 3D RGB feature completion module that completes RGB features with meaningful values for occluded areas. Moreover, we propose an effective classwise entropy loss function to punish inconsistency. Extensive experiments on public datasets verify that our method achieves state-of-the-art performance among methods that do not adopt extra data.

Towards Balanced RGB-TSDF Fusion for Consistent Semantic Scene Completion by 3D RGB Feature Completion and a Classwise Entropy Loss Function

TL;DR

A two-stage network with a 3D RGB feature completion module that completes RGB features with meaningful values for occluded areas and an effective classwise entropy loss function to punish inconsistency is proposed.

Abstract

Semantic Scene Completion (SSC) aims to jointly infer semantics and occupancies of 3D scenes. Truncated Signed Distance Function (TSDF), a 3D encoding of depth, has been a common input for SSC. Furthermore, RGB-TSDF fusion, seems promising since these two modalities provide color and geometry information, respectively. Nevertheless, RGB-TSDF fusion has been considered nontrivial and commonly-used naive addition will result in inconsistent results. We argue that the inconsistency comes from the sparsity of RGB features upon projecting into 3D space, while TSDF features are dense, leading to imbalanced feature maps when summed up. To address this RGB-TSDF distribution difference, we propose a two-stage network with a 3D RGB feature completion module that completes RGB features with meaningful values for occluded areas. Moreover, we propose an effective classwise entropy loss function to punish inconsistency. Extensive experiments on public datasets verify that our method achieves state-of-the-art performance among methods that do not adopt extra data.
Paper Structure (22 sections, 4 equations, 8 figures, 4 tables)

This paper contains 22 sections, 4 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Visualization of semantic scene completion results on NYUCAD dataset. From left to right: (a) RGB input, (b) depth map, (c) results of SSCNet song2017semantic, (d) results of SketchNet chen20203d, (e) results of our proposed method, (f) ground truth. Our method can achieve better instance consistency on sofa and wall, which are in occluded areas, compared with SSCNet song2017semantic and SketchNet chen20203d. Best viewed in color and zoomed in.
  • Figure 2: Addition of 3D RGB and TSDF features. We visualize RGB and TSDF in (a) and (b), respectively, for better illustration. In the resulting features (c), we visualize RGB on the visible surfaces and TSDF in occluded areas.
  • Figure 3: The overview of the proposed network. In 3D RGB feature completion stage, we generate useful TSDF features (TF1, TF2, TF3) and completed 3D RGB features (RRF1) with the proposed 3D RGB Feature Completion Module (FCM). The refined semantic scene completion stage will utilize features from the previous stage to produce the refined result.
  • Figure 4: Multi-scale fusion module. This module performs addition and deconvolution in a sequential way.
  • Figure 5: Example of applying FCM on the class chair. The 3D RGB feature maps are transformed from sparse to dense.
  • ...and 3 more figures