Table of Contents
Fetching ...

Efficient Test-Time Optimization for Depth Completion via Low-Rank Decoder Adaptation

Minseok Seo, Wonjun Lee, Jaehyuk Jang, Changick Kim

TL;DR

This work shows that adapting only the decoder is sufficient for effective test-time optimization, as depth foundation models concentrate depth-relevant information within a low-dimensional decoder subspace using sparse depth supervision.

Abstract

Zero-shot depth completion has gained attention for its ability to generalize across environments without sensor-specific datasets or retraining. However, most existing approaches rely on diffusion-based test-time optimization, which is computationally expensive due to iterative denoising. Recent visual-prompt-based methods reduce training cost but still require repeated forward--backward passes through the full frozen network to optimize input-level prompts, resulting in slow inference. In this work, we show that adapting only the decoder is sufficient for effective test-time optimization, as depth foundation models concentrate depth-relevant information within a low-dimensional decoder subspace. Based on this insight, we propose a lightweight test-time adaptation method that updates only this low-dimensional subspace using sparse depth supervision. Our approach achieves state-of-the-art performance, establishing a new Pareto frontier between accuracy and efficiency for test-time adaptation. Extensive experiments on five indoor and outdoor datasets demonstrate consistent improvements over prior methods, highlighting the practicality of fast zero-shot depth completion.

Efficient Test-Time Optimization for Depth Completion via Low-Rank Decoder Adaptation

TL;DR

This work shows that adapting only the decoder is sufficient for effective test-time optimization, as depth foundation models concentrate depth-relevant information within a low-dimensional decoder subspace using sparse depth supervision.

Abstract

Zero-shot depth completion has gained attention for its ability to generalize across environments without sensor-specific datasets or retraining. However, most existing approaches rely on diffusion-based test-time optimization, which is computationally expensive due to iterative denoising. Recent visual-prompt-based methods reduce training cost but still require repeated forward--backward passes through the full frozen network to optimize input-level prompts, resulting in slow inference. In this work, we show that adapting only the decoder is sufficient for effective test-time optimization, as depth foundation models concentrate depth-relevant information within a low-dimensional decoder subspace. Based on this insight, we propose a lightweight test-time adaptation method that updates only this low-dimensional subspace using sparse depth supervision. Our approach achieves state-of-the-art performance, establishing a new Pareto frontier between accuracy and efficiency for test-time adaptation. Extensive experiments on five indoor and outdoor datasets demonstrate consistent improvements over prior methods, highlighting the practicality of fast zero-shot depth completion.
Paper Structure (18 sections, 7 equations, 7 figures, 5 tables, 1 algorithm)

This paper contains 18 sections, 7 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: We compare a training-based method (PromptDA lin2025prompting) with test-time optimization-based depth completion approaches viola2025marigoldjeong2025test. PromptDA requires sensor-specific training and achieves real-time inference, but suffers from large reconstruction error. Existing test-time optimization-based improve accuracy at the cost of several seconds of inference per image. In contrast, our method establishes a new Pareto frontier by simultaneously achieving the lowest error and highly efficient inference among test-time optimization-based depth completion methods.
  • Figure 2: (a) Training-based depth completion relies on offline training with paired RGB–depth data. (b) Test-time optimization methods adapt either latent variables or visual prompts at inference time, incurring significant computational cost. (c) In contrast, our method adapts only the decoder low-dimensional subspace, which already encodes highly correlated depth structure, enabling efficient and fast test-time adaptation.
  • Figure 3: (a) Layer-wise correlation with the final depth output shows low correlation in the encoder and a sharp increase in the decoder. (b) PCA (PC1) visualizations indicate that decoder features already align closely with the final depth map, revealing strong depth information in a low-dimensional decoder subspace.
  • Figure 4: Efficiency and performance comparison of test-time adaptation strategies. Decoder-only LoRA minimizes trainable parameters and adaptation time, while achieving a favorable speed--accuracy trade-off.
  • Figure 5: Energy fraction captured by low-rank components of decoder weight updates. Most layers exhibit strongly low-rank structures, where rank $r=8$ explains over 90% of the total energy.
  • ...and 2 more figures