Meta-Optimization for Higher Model Generalizability in Single-Image Depth Prediction
Cho-Ying Wu, Yiqi Zhong, Junying Wang, Ulrich Neumann
TL;DR
This work tackles the problem of generalizing monocular depth prediction for indoor scenes to unseen environments. It introduces gradient-based meta-learning that treats each RGB-D pair as a fine-grained task, learning a depth prior $\theta^{prior}$ through a bilevel optimization with a base- and meta-optimizer, followed by conventional supervised training to yield $\theta^*$. The approach enables zero-shot cross-dataset transfer and improves depth accuracy without additional data or pretrained networks, achieving notable gains on cross-dataset protocols and even improving 3D representations for NeRF-style rendering. Overall, the method provides a simple, plug-in meta-initialization that enhances generalization in depth-from-single-image tasks and encourages practical deployment across diverse indoor scenes.
Abstract
Model generalizability to unseen datasets, concerned with in-the-wild robustness, is less studied for indoor single-image depth prediction. We leverage gradient-based meta-learning for higher generalizability on zero-shot cross-dataset inference. Unlike the most-studied image classification in meta-learning, depth is pixel-level continuous range values, and mappings from each image to depth vary widely across environments. Thus no explicit task boundaries exist. We instead propose fine-grained task that treats each RGB-D pair as a task in our meta-optimization. We first show meta-learning on limited data induces much better prior (max +29.4\%). Using meta-learned weights as initialization for following supervised learning, without involving extra data or information, it consistently outperforms baselines without the method. Compared to most indoor-depth methods that only train/ test on a single dataset, we propose zero-shot cross-dataset protocols, closely evaluate robustness, and show consistently higher generalizability and accuracy by our meta-initialization. The work at the intersection of depth and meta-learning potentially drives both research streams to step closer to practical use.
