GrounDiff: Diffusion-Based Ground Surface Generation from Digital Surface Models
Oussema Dhaouadi, Johannes Meier, Jacques Kaiser, Daniel Cremers
TL;DR
GrounDiff introduces a diffusion-based DSM-to-DTM framework that iteratively removes non-ground structures while preserving topography. A gated denoiser with a confidence head guides selective filtering, and Prior-Guided Stitching (PrioStitch) enables scalable, tile-based generation by conditioning high-resolution tiles on a learned low-resolution global prior. Across diverse benchmarks (ALS2DTM, USGS OpenTopography, GeRoD), GrounDiff achieves state-of-the-art RMSE reductions (up to 93% on ALS2DTM) and substantial improvements in road surface reconstruction, with GrounDiff+ offering smoother surfaces. The approach demonstrates strong cross-region generalization and practical potential for large-area terrain modeling, while identifying limitations in abrupt alpine terrain and vegetation occlusion, and pointing to future work in point-based diffusion and broader domain adaptation.
Abstract
Digital Terrain Models (DTMs) represent the bare-earth elevation and are important in numerous geospatial applications. Such data models cannot be directly measured by sensors and are typically generated from Digital Surface Models (DSMs) derived from LiDAR or photogrammetry. Traditional filtering approaches rely on manually tuned parameters, while learning-based methods require well-designed architectures, often combined with post-processing. To address these challenges, we introduce Ground Diffusion (GrounDiff), the first diffusion-based framework that iteratively removes non-ground structures by formulating the problem as a denoising task. We incorporate a gated design with confidence-guided generation that enables selective filtering. To increase scalability, we further propose Prior-Guided Stitching (PrioStitch), which employs a downsampled global prior automatically generated using GrounDiff to guide local high-resolution predictions. We evaluate our method on the DSM-to-DTM translation task across diverse datasets, showing that GrounDiff consistently outperforms deep learning-based state-of-the-art methods, reducing RMSE by up to 93% on ALS2DTM and up to 47% on USGS benchmarks. In the task of road reconstruction, which requires both high precision and smoothness, our method achieves up to 81% lower distance error compared to specialized techniques on the GeRoD benchmark, while maintaining competitive surface smoothness using only DSM inputs, without task-specific optimization. Our variant for road reconstruction, GrounDiff+, is specifically designed to produce even smoother surfaces, further surpassing state-of-the-art methods. The project page is available at https://deepscenario.github.io/GrounDiff/.
