Table of Contents
Fetching ...

Data Augmentation and Resolution Enhancement using GANs and Diffusion Models for Tree Segmentation

Alessandro dos Santos Ferreira, Ana Paula Marques Ramos, José Marcato Junior, Wesley Nunes Gonçalves

TL;DR

This work tackles robust tree segmentation in urban environments under cross-domain, low-resolution conditions with limited labeled data. It introduces a data augmentation pipeline that combines image-to-image translation (pix2pix) and super-resolution (Real-ESRGAN, Latent Diffusion, Stable Diffusion) to harmonize scales across domains and generate semantically consistent training samples for SegFormer. Empirical results show IoU improvements exceeding 50% on low-resolution imagery, with pix2pix-based translation generally delivering stronger segmentation gains than purely super-resolved samples, and diffusion methods offering competitive performance under careful configuration. The approach provides a scalable, annotation-efficient solution for remote sensing tree mapping across diverse sensor platforms and acquisition conditions.

Abstract

Urban forests play a key role in enhancing environmental quality and supporting biodiversity in cities. Mapping and monitoring these green spaces are crucial for urban planning and conservation, yet accurately detecting trees is challenging due to complex landscapes and the variability in image resolution caused by different satellite sensors or UAV flight altitudes. While deep learning architectures have shown promise in addressing these challenges, their effectiveness remains strongly dependent on the availability of large and manually labeled datasets, which are often expensive and difficult to obtain in sufficient quantity. In this work, we propose a novel pipeline that integrates domain adaptation with GANs and Diffusion models to enhance the quality of low-resolution aerial images. Our proposed pipeline enhances low-resolution imagery while preserving semantic content, enabling effective tree segmentation without requiring large volumes of manually annotated data. Leveraging models such as pix2pix, Real-ESRGAN, Latent Diffusion, and Stable Diffusion, we generate realistic and structurally consistent synthetic samples that expand the training dataset and unify scale across domains. This approach not only improves the robustness of segmentation models across different acquisition conditions but also provides a scalable and replicable solution for remote sensing scenarios with scarce annotation resources. Experimental results demonstrated an improvement of over 50% in IoU for low-resolution images, highlighting the effectiveness of our method compared to traditional pipelines.

Data Augmentation and Resolution Enhancement using GANs and Diffusion Models for Tree Segmentation

TL;DR

This work tackles robust tree segmentation in urban environments under cross-domain, low-resolution conditions with limited labeled data. It introduces a data augmentation pipeline that combines image-to-image translation (pix2pix) and super-resolution (Real-ESRGAN, Latent Diffusion, Stable Diffusion) to harmonize scales across domains and generate semantically consistent training samples for SegFormer. Empirical results show IoU improvements exceeding 50% on low-resolution imagery, with pix2pix-based translation generally delivering stronger segmentation gains than purely super-resolved samples, and diffusion methods offering competitive performance under careful configuration. The approach provides a scalable, annotation-efficient solution for remote sensing tree mapping across diverse sensor platforms and acquisition conditions.

Abstract

Urban forests play a key role in enhancing environmental quality and supporting biodiversity in cities. Mapping and monitoring these green spaces are crucial for urban planning and conservation, yet accurately detecting trees is challenging due to complex landscapes and the variability in image resolution caused by different satellite sensors or UAV flight altitudes. While deep learning architectures have shown promise in addressing these challenges, their effectiveness remains strongly dependent on the availability of large and manually labeled datasets, which are often expensive and difficult to obtain in sufficient quantity. In this work, we propose a novel pipeline that integrates domain adaptation with GANs and Diffusion models to enhance the quality of low-resolution aerial images. Our proposed pipeline enhances low-resolution imagery while preserving semantic content, enabling effective tree segmentation without requiring large volumes of manually annotated data. Leveraging models such as pix2pix, Real-ESRGAN, Latent Diffusion, and Stable Diffusion, we generate realistic and structurally consistent synthetic samples that expand the training dataset and unify scale across domains. This approach not only improves the robustness of segmentation models across different acquisition conditions but also provides a scalable and replicable solution for remote sensing scenarios with scarce annotation resources. Experimental results demonstrated an improvement of over 50% in IoU for low-resolution images, highlighting the effectiveness of our method compared to traditional pipelines.

Paper Structure

This paper contains 18 sections, 1 equation, 13 figures, 9 tables.

Figures (13)

  • Figure 1: At the top are sample images from dataset $P20$ with their respective pixel annotations. At the bottom are sample images from dataset $P50$ with their respective pixel annotations.
  • Figure 2: The images of the $P50$ dataset are resized to $640 \times 640$ using Lanczos resampling. For each resized image, we generated 9 patches of size $256 \times 256$ and translated them using pix2pix-trained models.
  • Figure 3: The images of $P50$ dataset are resized from $256 \times 256$ to $640 \times 640$ using Lanczos resampling method. After this step, we augmented the data, generating 9 patches of size $256 \times 256$.
  • Figure 4: The images of the $P50$ dataset are resized to $640 \times 640$ using Real-ESRGAN, Latent and Stable Diffusion. For each resized image, we augmented the data, generating 9 patches of size $256 \times 256$.
  • Figure 5: Sample images generated from datasets $P20$ and $P50$ using pix2pix ($P50-20p$ and $P50-50p$), Real-ESRGAN ($P20G$ and $P50G$), Latent Diffusion ($P20D$ and $P50D$), and Stable Diffusion ($P20S$ and $P50S$).
  • ...and 8 more figures