Re-Depth Anything: Test-Time Depth Refinement via Self-Supervised Re-lighting
Ananta R. Bhattarai, Helge Rhodin
TL;DR
This work tackles the domain gap in monocular depth estimation by introducing Re-Depth Anything, a test-time refinement framework that re-lights DA-V2 depth predictions and leverages a 2D diffusion prior via Score Distillation Sampling to self-supervise geometry without extra labels. The method targets only the intermediate embeddings and decoder weights while freezing the encoder, and uses depth ensembling across multiple re-lighting runs to stabilize the final output. Across CO3Dv2, KITTI, and ETH3D, it yields consistent quantitative improvements and richer visual details over DA-V2, while qualitative analyses show reduced noise and better structural fidelity. Limitations include occasional oversmoothing and sky artifacts, suggesting avenues for improved shading cues and adaptive regularization in future work.
Abstract
Monocular depth estimation remains challenging as recent foundation models, such as Depth Anything V2 (DA-V2), struggle with real-world images that are far from the training distribution. We introduce Re-Depth Anything, a test-time self-supervision framework that bridges this domain gap by fusing DA-V2 with the powerful priors of large-scale 2D diffusion models. Our method performs label-free refinement directly on the input image by re-lighting predicted depth maps and augmenting the input. This re-synthesis method replaces classical photometric reconstruction by leveraging shape from shading (SfS) cues in a new, generative context with Score Distillation Sampling (SDS). To prevent optimization collapse, our framework employs a targeted optimization strategy: rather than optimizing depth directly or fine-tuning the full model, we freeze the encoder and only update intermediate embeddings while also fine-tuning the decoder. Across diverse benchmarks, Re-Depth Anything yields substantial gains in depth accuracy and realism over the DA-V2, showcasing new avenues for self-supervision by augmenting geometric reasoning.
