Table of Contents
Fetching ...

GrounDiff: Diffusion-Based Ground Surface Generation from Digital Surface Models

Oussema Dhaouadi, Johannes Meier, Jacques Kaiser, Daniel Cremers

TL;DR

GrounDiff introduces a diffusion-based DSM-to-DTM framework that iteratively removes non-ground structures while preserving topography. A gated denoiser with a confidence head guides selective filtering, and Prior-Guided Stitching (PrioStitch) enables scalable, tile-based generation by conditioning high-resolution tiles on a learned low-resolution global prior. Across diverse benchmarks (ALS2DTM, USGS OpenTopography, GeRoD), GrounDiff achieves state-of-the-art RMSE reductions (up to 93% on ALS2DTM) and substantial improvements in road surface reconstruction, with GrounDiff+ offering smoother surfaces. The approach demonstrates strong cross-region generalization and practical potential for large-area terrain modeling, while identifying limitations in abrupt alpine terrain and vegetation occlusion, and pointing to future work in point-based diffusion and broader domain adaptation.

Abstract

Digital Terrain Models (DTMs) represent the bare-earth elevation and are important in numerous geospatial applications. Such data models cannot be directly measured by sensors and are typically generated from Digital Surface Models (DSMs) derived from LiDAR or photogrammetry. Traditional filtering approaches rely on manually tuned parameters, while learning-based methods require well-designed architectures, often combined with post-processing. To address these challenges, we introduce Ground Diffusion (GrounDiff), the first diffusion-based framework that iteratively removes non-ground structures by formulating the problem as a denoising task. We incorporate a gated design with confidence-guided generation that enables selective filtering. To increase scalability, we further propose Prior-Guided Stitching (PrioStitch), which employs a downsampled global prior automatically generated using GrounDiff to guide local high-resolution predictions. We evaluate our method on the DSM-to-DTM translation task across diverse datasets, showing that GrounDiff consistently outperforms deep learning-based state-of-the-art methods, reducing RMSE by up to 93% on ALS2DTM and up to 47% on USGS benchmarks. In the task of road reconstruction, which requires both high precision and smoothness, our method achieves up to 81% lower distance error compared to specialized techniques on the GeRoD benchmark, while maintaining competitive surface smoothness using only DSM inputs, without task-specific optimization. Our variant for road reconstruction, GrounDiff+, is specifically designed to produce even smoother surfaces, further surpassing state-of-the-art methods. The project page is available at https://deepscenario.github.io/GrounDiff/.

GrounDiff: Diffusion-Based Ground Surface Generation from Digital Surface Models

TL;DR

GrounDiff introduces a diffusion-based DSM-to-DTM framework that iteratively removes non-ground structures while preserving topography. A gated denoiser with a confidence head guides selective filtering, and Prior-Guided Stitching (PrioStitch) enables scalable, tile-based generation by conditioning high-resolution tiles on a learned low-resolution global prior. Across diverse benchmarks (ALS2DTM, USGS OpenTopography, GeRoD), GrounDiff achieves state-of-the-art RMSE reductions (up to 93% on ALS2DTM) and substantial improvements in road surface reconstruction, with GrounDiff+ offering smoother surfaces. The approach demonstrates strong cross-region generalization and practical potential for large-area terrain modeling, while identifying limitations in abrupt alpine terrain and vegetation occlusion, and pointing to future work in point-based diffusion and broader domain adaptation.

Abstract

Digital Terrain Models (DTMs) represent the bare-earth elevation and are important in numerous geospatial applications. Such data models cannot be directly measured by sensors and are typically generated from Digital Surface Models (DSMs) derived from LiDAR or photogrammetry. Traditional filtering approaches rely on manually tuned parameters, while learning-based methods require well-designed architectures, often combined with post-processing. To address these challenges, we introduce Ground Diffusion (GrounDiff), the first diffusion-based framework that iteratively removes non-ground structures by formulating the problem as a denoising task. We incorporate a gated design with confidence-guided generation that enables selective filtering. To increase scalability, we further propose Prior-Guided Stitching (PrioStitch), which employs a downsampled global prior automatically generated using GrounDiff to guide local high-resolution predictions. We evaluate our method on the DSM-to-DTM translation task across diverse datasets, showing that GrounDiff consistently outperforms deep learning-based state-of-the-art methods, reducing RMSE by up to 93% on ALS2DTM and up to 47% on USGS benchmarks. In the task of road reconstruction, which requires both high precision and smoothness, our method achieves up to 81% lower distance error compared to specialized techniques on the GeRoD benchmark, while maintaining competitive surface smoothness using only DSM inputs, without task-specific optimization. Our variant for road reconstruction, GrounDiff+, is specifically designed to produce even smoother surfaces, further surpassing state-of-the-art methods. The project page is available at https://deepscenario.github.io/GrounDiff/.

Paper Structure

This paper contains 51 sections, 18 equations, 20 figures, 7 tables.

Figures (20)

  • Figure 1: Geospatial surface models. Comparison between DSM, DTM, and nDSM.
  • Figure 2: DTM applications in autonomous driving: object detection refinement using geospatial data. Left: Textured 3D mesh with surface noise and artifacts affecting DSM quality. Right: 3D bounding box height refinement via raycasting---the red box (using noisy DSM) shows incorrect vertical positioning due to surface artifacts, while the green box (using clean DTM) achieves accurate ground-level placement essential for safe navigation.
  • Figure 3: Method overview (on 1D terrain for clarity). (1) Training: Forward diffusion process where (a) ground-truth DTM $g_0$ is corrupted with noise to obtain $g_t$, and (b) denoiser takes noisy terrain $g_t$ and DSM $s$ to predict nDSM $\hat{r}$ and classification logits $l$. The nDSM is subtracted from DSM to generate initial DTM, then (c) refined using ground probabilities to produce final estimate $\hat{g}_0$. (2) Inference: Reverse process starts with (d) prior (e.g., Gaussian noise, noisy DSM, or low-resolution DTM) and iteratively applies the gated denoiser (e) conditioned on DSM $s$, progressively denoising from $g_T$ to $g_0$ to recover the final DTM.
  • Figure 4: Our denoiser architecture. (a) The network follows a U-Net encoder–decoder based on (c) residual blocks with FiLM perez2018film timestep conditioning, and skip connections. Dual outputs produce residual corrections $\hat{r}$ and confidence logits $\ell$, which are (b) combined via the gating function in \ref{['eq:gating']}.
  • Figure 5: stitching strategy. Our approach scales method to large DSM through: low-resolution prior generation by (a) downsampling the input DSM and (b) applying method, (c) tiling the original DSM into overlapping patches, conditioning each patch with the corresponding region from the upsampled prior DTM, and (d) blending the processed tiles using weighted fusion to produce the final high-resolution DTM.
  • ...and 15 more figures