Table of Contents
Fetching ...

Test-Time Adaptation for Depth Completion

Hyoungseob Park, Anjali Gupta, Alex Wong

TL;DR

This paper tackles the domain gap in depth completion by introducing ProxyTTA, a test-time adaptation approach that leverages a sparse-depth proxy to guide online, single-pass adaptation. By analyzing modality sensitivity, it shows sparse depth is more robust to photometric shifts than RGB and uses this insight to learn proxy embeddings that align target RGB features with source-domain representations via a lightweight adaptation layer. The method comprises a source-domain preparation stage and a target-domain deployment stage with a proxy-consistency objective, alongside sparse-depth and smoothness losses. Empirically, ProxyTTA improves performance across indoor and outdoor datasets and multiple architectures, achieving about 21% average gains over strong baselines, and enabling effective, low-cost adaptation in real-time settings.

Abstract

It is common to observe performance degradation when transferring models trained on some (source) datasets to target testing data due to a domain gap between them. Existing methods for bridging this gap, such as domain adaptation (DA), may require the source data on which the model was trained (often not available), while others, i.e., source-free DA, require many passes through the testing data. We propose an online test-time adaptation method for depth completion, the task of inferring a dense depth map from a single image and associated sparse depth map, that closes the performance gap in a single pass. We first present a study on how the domain shift in each data modality affects model performance. Based on our observations that the sparse depth modality exhibits a much smaller covariate shift than the image, we design an embedding module trained in the source domain that preserves a mapping from features encoding only sparse depth to those encoding image and sparse depth. During test time, sparse depth features are projected using this map as a proxy for source domain features and are used as guidance to train a set of auxiliary parameters (i.e., adaptation layer) to align image and sparse depth features from the target test domain to that of the source domain. We evaluate our method on indoor and outdoor scenarios and show that it improves over baselines by an average of 21.1%.

Test-Time Adaptation for Depth Completion

TL;DR

This paper tackles the domain gap in depth completion by introducing ProxyTTA, a test-time adaptation approach that leverages a sparse-depth proxy to guide online, single-pass adaptation. By analyzing modality sensitivity, it shows sparse depth is more robust to photometric shifts than RGB and uses this insight to learn proxy embeddings that align target RGB features with source-domain representations via a lightweight adaptation layer. The method comprises a source-domain preparation stage and a target-domain deployment stage with a proxy-consistency objective, alongside sparse-depth and smoothness losses. Empirically, ProxyTTA improves performance across indoor and outdoor datasets and multiple architectures, achieving about 21% average gains over strong baselines, and enabling effective, low-cost adaptation in real-time settings.

Abstract

It is common to observe performance degradation when transferring models trained on some (source) datasets to target testing data due to a domain gap between them. Existing methods for bridging this gap, such as domain adaptation (DA), may require the source data on which the model was trained (often not available), while others, i.e., source-free DA, require many passes through the testing data. We propose an online test-time adaptation method for depth completion, the task of inferring a dense depth map from a single image and associated sparse depth map, that closes the performance gap in a single pass. We first present a study on how the domain shift in each data modality affects model performance. Based on our observations that the sparse depth modality exhibits a much smaller covariate shift than the image, we design an embedding module trained in the source domain that preserves a mapping from features encoding only sparse depth to those encoding image and sparse depth. During test time, sparse depth features are projected using this map as a proxy for source domain features and are used as guidance to train a set of auxiliary parameters (i.e., adaptation layer) to align image and sparse depth features from the target test domain to that of the source domain. We evaluate our method on indoor and outdoor scenarios and show that it improves over baselines by an average of 21.1%.
Paper Structure (8 sections, 7 equations, 7 figures, 9 tables)

This paper contains 8 sections, 7 equations, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Model sensitivity to input modalities. While utilizing both sparse depth and image as input, the best performance is achieved in the source domain (VOID). Yet, forgoing the image in the test domain (NYUv2) often yields lower error than using both as input.
  • Figure 2: Model sensitivity to input modalities. Depth completion networks have a high reliance on sparse depth modality. Performing inference in a novel domain without the RGB image, i.e., using just sparse depth as input, can improve over using both data modalities.
  • Figure 3: Overview. (a) The pretraining stage integrates an adaptation layer into a pretrained encoder and pretrains the adaptation layer on the source dataset. (b) The preparation stage learns the proxy mapping of features encoding sparse depth to those encoding both inputs. (c) The adaptation stage deploys the model to the target domain and updates the adaptation layer by leveraging proxy embeddings as guidance.
  • Figure 4: Qualitative results on NYUv2. For indoors scenarios, ProxyTTA performs better in boundary regions displaying the discontinuity in depth (e.g., curtains, (a)), as well as homogeneous regions (e.g., blackboard, (d)). Boxes highlight detailed comparisons.
  • Figure 5: Qualitative results on NuScenes. For outdoor adaptation scenarios, ProxyTTA improves over BN Adapt and CoTTA, notably in both depth-discontinuous regions (e.g., car in (b)) and homogeneous regions (e.g., road in (a) and (b)). Boxes highlight detailed comparisons.
  • ...and 2 more figures