Test-Time Adaptation for Depth Completion
Hyoungseob Park, Anjali Gupta, Alex Wong
TL;DR
This paper tackles the domain gap in depth completion by introducing ProxyTTA, a test-time adaptation approach that leverages a sparse-depth proxy to guide online, single-pass adaptation. By analyzing modality sensitivity, it shows sparse depth is more robust to photometric shifts than RGB and uses this insight to learn proxy embeddings that align target RGB features with source-domain representations via a lightweight adaptation layer. The method comprises a source-domain preparation stage and a target-domain deployment stage with a proxy-consistency objective, alongside sparse-depth and smoothness losses. Empirically, ProxyTTA improves performance across indoor and outdoor datasets and multiple architectures, achieving about 21% average gains over strong baselines, and enabling effective, low-cost adaptation in real-time settings.
Abstract
It is common to observe performance degradation when transferring models trained on some (source) datasets to target testing data due to a domain gap between them. Existing methods for bridging this gap, such as domain adaptation (DA), may require the source data on which the model was trained (often not available), while others, i.e., source-free DA, require many passes through the testing data. We propose an online test-time adaptation method for depth completion, the task of inferring a dense depth map from a single image and associated sparse depth map, that closes the performance gap in a single pass. We first present a study on how the domain shift in each data modality affects model performance. Based on our observations that the sparse depth modality exhibits a much smaller covariate shift than the image, we design an embedding module trained in the source domain that preserves a mapping from features encoding only sparse depth to those encoding image and sparse depth. During test time, sparse depth features are projected using this map as a proxy for source domain features and are used as guidance to train a set of auxiliary parameters (i.e., adaptation layer) to align image and sparse depth features from the target test domain to that of the source domain. We evaluate our method on indoor and outdoor scenarios and show that it improves over baselines by an average of 21.1%.
