Test-Time Adaptation for Height Completion via Self-Supervised ViT Features and Monocular Foundation Models

Osher Rafaeli; Tal Svoray; Ariel Nahlieli

Test-Time Adaptation for Height Completion via Self-Supervised ViT Features and Monocular Foundation Models

Osher Rafaeli, Tal Svoray, Ariel Nahlieli

Abstract

Accurate digital surface models (DSMs) are essential for many geospatial applications, including urban monitoring, environmental analyses, infrastructure management, and change detection. However, large-scale DSMs frequently contain incomplete or outdated regions due to acquisition limitations, reconstruction artifacts, or changes in the built environment. Traditional height completion approaches primarily rely on spatial interpolation or which assume spatial continuity and therefore fail when objects are missing. Recent learning-based approaches improve reconstruction quality but typically require supervised training on sensor-specific datasets, limiting their generalization across domains and sensing conditions. We propose Prior2DSM, a training-free framework for metric DSM completion that operates entirely at test time by leveraging foundation models. Unlike previous height completion approaches that require task-specific training, the proposed method combines self-supervised Vision Transformer (ViT) features from DINOv3 with monocular depth foundation models to propagate metric information from incomplete height priors through semantic feature-space correspondence. Test-time adaptation (TTA) is performed using parameter-efficient low-rank adaptation (LoRA) together with a lightweight multilayer perceptron (MLP), which predicts spatially varying scale and shift parameters to convert relative depth estimates into metric heights. Experiments demonstrate consistent improvements over interpolation based methods, prior-based rescaling height approaches, and state-of-the-art monocular depth estimation models. Prior2DSM reduces reconstruction error while preserving structural fidelity, achieving up to a 46% reduction in RMSE compared to linear fitting of MDE, and further enables DSM updating and coupled RGB-DSM generation.

Test-Time Adaptation for Height Completion via Self-Supervised ViT Features and Monocular Foundation Models

Abstract

Test-Time Adaptation for Height Completion via Self-Supervised ViT Features and Monocular Foundation Models

Abstract

Paper Structure

Table of Contents

Figures (7)