Table of Contents
Fetching ...

Uncertainty modeling for fine-tuned implicit functions

Anna Susmelj, Mael Macuglia, Nataša Tagasovska, Reto Sutter, Sebastiano Caprara, Jean-Philippe Thiran, Ender Konukoglu

TL;DR

This work tackles unreliable reconstructions when fine-tuning neural implicit representations (e.g., occupancy networks) from dense synthetic priors to sparse, corrupted real data. It introduces Dropsembles, a dropout-based ensemble approach that, combined with Elastic Weight Consolidation (EWC), captures epistemic uncertainty in the decoder during fine-tuning while keeping computational costs far below full ensembles. Across toy data, MNIST-based reconstruction, ShapeNet, and a medical imaging spine dataset, Dropsembles achieve ensemble-level accuracy and calibration with substantial efficiency gains, particularly under distribution shifts. The results demonstrate a practical path toward trustworthy 3D reconstruction in settings with limited or noisy data, with potential impact in medical imaging and other sparse-view applications.

Abstract

Implicit functions such as Neural Radiance Fields (NeRFs), occupancy networks, and signed distance functions (SDFs) have become pivotal in computer vision for reconstructing detailed object shapes from sparse views. Achieving optimal performance with these models can be challenging due to the extreme sparsity of inputs and distribution shifts induced by data corruptions. To this end, large, noise-free synthetic datasets can serve as shape priors to help models fill in gaps, but the resulting reconstructions must be approached with caution. Uncertainty estimation is crucial for assessing the quality of these reconstructions, particularly in identifying areas where the model is uncertain about the parts it has inferred from the prior. In this paper, we introduce Dropsembles, a novel method for uncertainty estimation in tuned implicit functions. We demonstrate the efficacy of our approach through a series of experiments, starting with toy examples and progressing to a real-world scenario. Specifically, we train a Convolutional Occupancy Network on synthetic anatomical data and test it on low-resolution MRI segmentations of the lumbar spine. Our results show that Dropsembles achieve the accuracy and calibration levels of deep ensembles but with significantly less computational cost.

Uncertainty modeling for fine-tuned implicit functions

TL;DR

This work tackles unreliable reconstructions when fine-tuning neural implicit representations (e.g., occupancy networks) from dense synthetic priors to sparse, corrupted real data. It introduces Dropsembles, a dropout-based ensemble approach that, combined with Elastic Weight Consolidation (EWC), captures epistemic uncertainty in the decoder during fine-tuning while keeping computational costs far below full ensembles. Across toy data, MNIST-based reconstruction, ShapeNet, and a medical imaging spine dataset, Dropsembles achieve ensemble-level accuracy and calibration with substantial efficiency gains, particularly under distribution shifts. The results demonstrate a practical path toward trustworthy 3D reconstruction in settings with limited or noisy data, with potential impact in medical imaging and other sparse-view applications.

Abstract

Implicit functions such as Neural Radiance Fields (NeRFs), occupancy networks, and signed distance functions (SDFs) have become pivotal in computer vision for reconstructing detailed object shapes from sparse views. Achieving optimal performance with these models can be challenging due to the extreme sparsity of inputs and distribution shifts induced by data corruptions. To this end, large, noise-free synthetic datasets can serve as shape priors to help models fill in gaps, but the resulting reconstructions must be approached with caution. Uncertainty estimation is crucial for assessing the quality of these reconstructions, particularly in identifying areas where the model is uncertain about the parts it has inferred from the prior. In this paper, we introduce Dropsembles, a novel method for uncertainty estimation in tuned implicit functions. We demonstrate the efficacy of our approach through a series of experiments, starting with toy examples and progressing to a real-world scenario. Specifically, we train a Convolutional Occupancy Network on synthetic anatomical data and test it on low-resolution MRI segmentations of the lumbar spine. Our results show that Dropsembles achieve the accuracy and calibration levels of deep ensembles but with significantly less computational cost.
Paper Structure (30 sections, 5 equations, 13 figures, 7 tables, 1 algorithm)

This paper contains 30 sections, 5 equations, 13 figures, 7 tables, 1 algorithm.

Figures (13)

  • Figure 1: Occupancy network training with a dense prior and fine-tuning on a sparse dataset.
  • Figure 2: Toy classification example. a) Training data for binary classification task ("red" vs "blue") from datasets A (points, light) and B (crosses, dark) b) MC dropout trained only on Dataset A. c) Comparison of methods fine-tuned on Dataset B. Points are colored by the predicted class. EWC consistently improves both accuracy and uncertainty estimates on both A and B datasets.
  • Figure 3: Corrupted MNIST reconstruction example. a) Example of training images. b) Comparison of fine-tuned methods on dataset B. A "perfectly calibrated" method would have reliability diagrams aligned on the diagonal. A "good conservative" method would have all bars above the diagonal.
  • Figure 4: Lumbar spine reconstruction example on Subject 2. a) 3D-rendered views of sparse inputs (bicubic upsampling), dense ground truth (GT), and predictions by our method. b) Histograms of uncertainty (entropy) values, truncated between 0.1 and 1 for visibility. c) Examples of uncertainty estimates for two different sagittal slices of the 3D volume. d) For each method, network predictions are randomly sampled and the corresponding reconstructions are depicted.
  • Figure 5: Example in the performance difference between Dropsembles w/ and w/o EWC. Grey segmentations are produced by Dropsembles w/ or w/o EWC regularization overlayed with colorful ground-truth segmentations. EWC better captures subtle details in modeling thin structures, such as vertebra processes.
  • ...and 8 more figures