Multi-Class Segmentation from Aerial Views using Recursive Noise Diffusion
Benedikt Kolbeinsson, Krystian Mikolajczyk
TL;DR
This work addresses the challenge of multi-class semantic segmentation for aerial imagery by introducing a recursive denoising diffusion framework with hierarchical multi-scale processing. The method defines a forward diffusion on segmentation maps conditioned on RGB input and learns a denoiser that can predict segmentation across arbitrary time steps, enhanced by training with recursive denoising and a multi-scale strategy. It reports strong results on UAVid and state-of-the-art performance on Vaihingen Buildings, illustrating the potential of diffusion-based, multi-class aerial segmentation. The approach offers flexibility in noise functions, diffusion models, and losses, and highlights practical considerations such as inference-time trade-offs and data requirements, paving the way for future improvements and broader applications.
Abstract
Semantic segmentation from aerial views is a crucial task for autonomous drones, as they rely on precise and accurate segmentation to navigate safely and efficiently. However, aerial images present unique challenges such as diverse viewpoints, extreme scale variations, and high scene complexity. In this paper, we propose an end-to-end multi-class semantic segmentation diffusion model that addresses these challenges. We introduce recursive denoising to allow information to propagate through the denoising process, as well as a hierarchical multi-scale approach that complements the diffusion process. Our method achieves promising results on the UAVid dataset and state-of-the-art performance on the Vaihingen Building segmentation benchmark. Being the first iteration of this method, it shows great promise for future improvements.
