Table of Contents
Fetching ...

uSF: Learning Neural Semantic Field with Uncertainty

Vsevolod Skorokhodov, Darya Drozdova, Dmitry Yudin

TL;DR

This work tackles the lack of confidence estimates in NeRF-based 3D scene reconstruction by introducing uSF, a neural field that jointly learns color, semantic labels, and their aleatoric uncertainties. The method adds separate uncertainty heads for RGB and semantic predictions and uses a reparameterization-based Monte Carlo approach to estimate semantic probabilities, all while employing a hash-based positional encoding for efficiency. The learning objective combines color accuracy, semantic classification, and uncertainty regularization via a weighted sum $L = \omega L_{rgb} + \lambda L_{semantic} + (1 - \omega) L_{uncert}$, enabling robust performance with limited training data. Experiments on the Replica dataset show that incorporating uncertainty can improve semantic reconstruction quality and that hash encoding significantly speeds up training compared to baseline NeRF variants, highlighting practical benefits for real-world 3D understanding tasks.

Abstract

Recently, there has been an increased interest in NeRF methods which reconstruct differentiable representation of three-dimensional scenes. One of the main limitations of such methods is their inability to assess the confidence of the model in its predictions. In this paper, we propose a new neural network model for the formation of extended vector representations, called uSF, which allows the model to predict not only color and semantic label of each point, but also estimate the corresponding values of uncertainty. We show that with a small number of images available for training, a model quantifying uncertainty performs better than a model without such functionality. Code of the uSF approach is publicly available at https://github.com/sevashasla/usf/.

uSF: Learning Neural Semantic Field with Uncertainty

TL;DR

This work tackles the lack of confidence estimates in NeRF-based 3D scene reconstruction by introducing uSF, a neural field that jointly learns color, semantic labels, and their aleatoric uncertainties. The method adds separate uncertainty heads for RGB and semantic predictions and uses a reparameterization-based Monte Carlo approach to estimate semantic probabilities, all while employing a hash-based positional encoding for efficiency. The learning objective combines color accuracy, semantic classification, and uncertainty regularization via a weighted sum , enabling robust performance with limited training data. Experiments on the Replica dataset show that incorporating uncertainty can improve semantic reconstruction quality and that hash encoding significantly speeds up training compared to baseline NeRF variants, highlighting practical benefits for real-world 3D understanding tasks.

Abstract

Recently, there has been an increased interest in NeRF methods which reconstruct differentiable representation of three-dimensional scenes. One of the main limitations of such methods is their inability to assess the confidence of the model in its predictions. In this paper, we propose a new neural network model for the formation of extended vector representations, called uSF, which allows the model to predict not only color and semantic label of each point, but also estimate the corresponding values of uncertainty. We show that with a small number of images available for training, a model quantifying uncertainty performs better than a model without such functionality. Code of the uSF approach is publicly available at https://github.com/sevashasla/usf/.
Paper Structure (20 sections, 8 equations, 4 figures, 4 tables)

This paper contains 20 sections, 8 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: We propose the model named uSF to predict both color and semantic labels and estimate corresponding uncertainty.
  • Figure 2: From left to right we show the predicted color, predicted semantic labels, rgb uncertainty and semantic uncertainty during training process.
  • Figure 3: uSF architecture. Orange rectangles are Linear + ReLU layers. Yellow rectangles are outputs of our model. We study different variants of our architecture in terms of the color (c0-c3) and the semantic (s0-s1) branches.
  • Figure 4: The mean values of mIoU and PSNR depending on $\omega$ in the first row and $\lambda$ in the second row for selected scenes from Replica Dataset.