Table of Contents
Fetching ...

Seamless High-Resolution Terrain Reconstruction: A Prior-Based Vision Transformer Approach

Osher Rafaeli, Tal Svoray, Ariel Nahlieli

Abstract

High-resolution elevation data is essential for hydrological modeling, hazard assessment, and environmental monitoring; however, globally consistent, fine-scale Digital Elevation Models (DEMs) remain unavailable. Very high-resolution single-view imagery enables the extraction of topographic information at the pixel level, allowing the reconstruction of fine terrain details over large spatial extents. In this paper, we present single-view-based DEM reconstruction shown to support practical analysis in GIS environments across multiple sub-national jurisdictions. Specifically, we produce high-resolution DEMs for large-scale basins, representing a substantial improvement over the 30 m resolution of globally available Shuttle Radar Topography Mission (SRTM) data. The DEMs are generated using a prior-based monocular depth foundation (MDE) model, extended in this work to the remote sensing height domain for high-resolution, globally consistent elevation reconstruction. We fine-tune the model by integrating low-resolution SRTM data as a global prior with high-resolution RGB imagery from the National Agriculture Imagery Program (NAIP), producing DEMs with near LiDAR-level accuracy. Our method achieves a 100x resolution enhancement (from 30 m to 30 cm), exceeding existing super-resolution approaches by an order of magnitude. Across two diverse landscapes, the model generalizes robustly, resolving fine-scale terrain features with a mean absolute error of less than 5 m relative to LiDAR and improving upon SRTM by up to 18 %. Hydrological analyses at both catchment and hillslope scales confirm the method's utility for hazard assessment and environmental monitoring, demonstrating improved streamflow representation and catchment delineation. Finally, we demonstrate the scalability of the framework by applying it across large geographic regions.

Seamless High-Resolution Terrain Reconstruction: A Prior-Based Vision Transformer Approach

Abstract

High-resolution elevation data is essential for hydrological modeling, hazard assessment, and environmental monitoring; however, globally consistent, fine-scale Digital Elevation Models (DEMs) remain unavailable. Very high-resolution single-view imagery enables the extraction of topographic information at the pixel level, allowing the reconstruction of fine terrain details over large spatial extents. In this paper, we present single-view-based DEM reconstruction shown to support practical analysis in GIS environments across multiple sub-national jurisdictions. Specifically, we produce high-resolution DEMs for large-scale basins, representing a substantial improvement over the 30 m resolution of globally available Shuttle Radar Topography Mission (SRTM) data. The DEMs are generated using a prior-based monocular depth foundation (MDE) model, extended in this work to the remote sensing height domain for high-resolution, globally consistent elevation reconstruction. We fine-tune the model by integrating low-resolution SRTM data as a global prior with high-resolution RGB imagery from the National Agriculture Imagery Program (NAIP), producing DEMs with near LiDAR-level accuracy. Our method achieves a 100x resolution enhancement (from 30 m to 30 cm), exceeding existing super-resolution approaches by an order of magnitude. Across two diverse landscapes, the model generalizes robustly, resolving fine-scale terrain features with a mean absolute error of less than 5 m relative to LiDAR and improving upon SRTM by up to 18 %. Hydrological analyses at both catchment and hillslope scales confirm the method's utility for hazard assessment and environmental monitoring, demonstrating improved streamflow representation and catchment delineation. Finally, we demonstrate the scalability of the framework by applying it across large geographic regions.

Paper Structure

This paper contains 17 sections, 11 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: Our study sites for training and evaluation are located in four different states in the United States. For natural vegetated areas, we used Kelsey Peak and Diamond Fork Canyon in Utah. For natural, bare sites, we used Chaco Canyon in New Mexico and Casa Diablo Mountain in California. For model application, we used the Escalante River in Utah, and the Dead Sea in Israel.
  • Figure 2: Histograms of elevation and slope across the four study sites indicate that naturally vegetated regions generally have steeper slopes compared to bare terrain.
  • Figure 3: Sample patches from the training dataset include NAIP RGB imagery, SRTM data as priors, and high-resolution LiDAR DEMs as ground truth.
  • Figure 4: Proposed Framework: Dataset and Landscape categories: The model was trained and evaluated on natural vegetated and natural bare lands. All model components were fine-tuned using NAIP 3-channel (RGB) aerial images with LR-SRTM and HR-DEM extracted from HR LiDAR-based DEM. This mitigates edge artifacts and ensures smooth transitions between overlapping patches. We applied a linear weighted mask.
  • Figure 5: Visual Evaluation - DEM estimation in natural vegetated landscape, elevation in meters, slope in degrees, aspect in radial direction, and hillshade visualization. The global SRTM general terrain was preserved for elevation prediction, while RGB information added detailed terrain features, such as small streams that are clearly visible in the slope, aspect, and hillshade maps.
  • ...and 7 more figures