Towards Sharper Object Boundaries in Self-Supervised Depth Estimation
Aurélien Cecille, Stefan Duffner, Franck Davoine, Rémi Agier, Thibault Neveu
TL;DR
This work tackles boundary blur in self-supervised monocular depth estimation by modeling per-pixel disparity as a two-component Gaussian mixture, enabling sharp depth discontinuities without ground-truth depth. The authors derive principled propagation of each mixture component through reprojection, color interpolation, and a distribution-aware loss, and they infer by selecting the most likely component. An edge-entropy based sharpness measure isolates boundary quality from overall depth accuracy, and experiments on KITTI and VKITTIv2 show improved boundary sharpness with competitive depth metrics. Ablation studies confirm that propagation of uncertainty is essential for effective component specialization, and the mixture-weights provide an interpretable cue for scene structure and boundaries. The approach opens avenues for improved self-supervised depth, optical flow, and potential instance segmentation using learned mixture weights.
Abstract
Accurate monocular depth estimation is crucial for 3D scene understanding, but existing methods often blur depth at object boundaries, introducing spurious intermediate 3D points. While achieving sharp edges usually requires very fine-grained supervision, our method produces crisp depth discontinuities using only self-supervision. Specifically, we model per-pixel depth as a mixture distribution, capturing multiple plausible depths and shifting uncertainty from direct regression to the mixture weights. This formulation integrates seamlessly into existing pipelines via variance-aware loss functions and uncertainty propagation. Extensive evaluations on KITTI and VKITTIv2 show that our method achieves up to 35% higher boundary sharpness and improves point cloud quality compared to state-of-the-art baselines.
