Table of Contents
Fetching ...

Towards Sharper Object Boundaries in Self-Supervised Depth Estimation

Aurélien Cecille, Stefan Duffner, Franck Davoine, Rémi Agier, Thibault Neveu

TL;DR

This work tackles boundary blur in self-supervised monocular depth estimation by modeling per-pixel disparity as a two-component Gaussian mixture, enabling sharp depth discontinuities without ground-truth depth. The authors derive principled propagation of each mixture component through reprojection, color interpolation, and a distribution-aware loss, and they infer by selecting the most likely component. An edge-entropy based sharpness measure isolates boundary quality from overall depth accuracy, and experiments on KITTI and VKITTIv2 show improved boundary sharpness with competitive depth metrics. Ablation studies confirm that propagation of uncertainty is essential for effective component specialization, and the mixture-weights provide an interpretable cue for scene structure and boundaries. The approach opens avenues for improved self-supervised depth, optical flow, and potential instance segmentation using learned mixture weights.

Abstract

Accurate monocular depth estimation is crucial for 3D scene understanding, but existing methods often blur depth at object boundaries, introducing spurious intermediate 3D points. While achieving sharp edges usually requires very fine-grained supervision, our method produces crisp depth discontinuities using only self-supervision. Specifically, we model per-pixel depth as a mixture distribution, capturing multiple plausible depths and shifting uncertainty from direct regression to the mixture weights. This formulation integrates seamlessly into existing pipelines via variance-aware loss functions and uncertainty propagation. Extensive evaluations on KITTI and VKITTIv2 show that our method achieves up to 35% higher boundary sharpness and improves point cloud quality compared to state-of-the-art baselines.

Towards Sharper Object Boundaries in Self-Supervised Depth Estimation

TL;DR

This work tackles boundary blur in self-supervised monocular depth estimation by modeling per-pixel disparity as a two-component Gaussian mixture, enabling sharp depth discontinuities without ground-truth depth. The authors derive principled propagation of each mixture component through reprojection, color interpolation, and a distribution-aware loss, and they infer by selecting the most likely component. An edge-entropy based sharpness measure isolates boundary quality from overall depth accuracy, and experiments on KITTI and VKITTIv2 show improved boundary sharpness with competitive depth metrics. Ablation studies confirm that propagation of uncertainty is essential for effective component specialization, and the mixture-weights provide an interpretable cue for scene structure and boundaries. The approach opens avenues for improved self-supervised depth, optical flow, and potential instance segmentation using learned mixture weights.

Abstract

Accurate monocular depth estimation is crucial for 3D scene understanding, but existing methods often blur depth at object boundaries, introducing spurious intermediate 3D points. While achieving sharp edges usually requires very fine-grained supervision, our method produces crisp depth discontinuities using only self-supervision. Specifically, we model per-pixel depth as a mixture distribution, capturing multiple plausible depths and shifting uncertainty from direct regression to the mixture weights. This formulation integrates seamlessly into existing pipelines via variance-aware loss functions and uncertainty propagation. Extensive evaluations on KITTI and VKITTIv2 show that our method achieves up to 35% higher boundary sharpness and improves point cloud quality compared to state-of-the-art baselines.

Paper Structure

This paper contains 18 sections, 11 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Reprojection pipeline with mixture distributions. ① Predicted depth distribution for each pixel. ② The depth distribution is projected into the support view, resulting in a distribution of positions. ③ These positions are sampled using bilinear interpolation to obtain a color distribution which is compared to the target color to compute the loss.
  • Figure 2: Other methods typically produce floating artifacts at object boundaries, resulting in spurious points between foreground and background. In contrast, our method generates clean point clouds with sharp object boundaries, effectively eliminating these artifacts.
  • Figure 3: (1) Input Image (2) Predicted Depth (3) Mixture weight: Blue=0 , White=0.5, Red=1 (4) Relative difference between components means. At object boundaries, our model correctly identifies both depth modes and creates sharp discontinuities.
  • Figure 4: Depth distribution along a single row. Bold points indicate the component selected by the mixture weight. We see that both components complementarily over/underestimate obstacle width smoothly and that it's the selection process that creates the discontinuity.
  • Figure 5: Entropy visualization at object borders: low entropy indicates sharp transitions and high entropy indicates blurry ones.
  • ...and 1 more figures