Table of Contents
Fetching ...

VDD: Varied Drone Dataset for Semantic Segmentation

Wenxiao Cai, Ke Jin, Jinyan Hou, Cong Guo, Letian Wu, Wankou Yang

TL;DR

The paper addresses the scarcity of diverse, high-resolution, low-altitude drone datasets for semantic segmentation by introducing Varied Drone Dataset (VDD), a 400-image, 7-class collection with varied camera angles, scenes, and lighting. It further fuses VDD with existing datasets (UDD, UAVid) into the Integrated Drone Dataset (IDD), totaling 811 images across 7 classes, and demonstrates cross-dataset gains with state-of-the-art models. The work provides extensive baselines using Mask2Former, SegFormer, and UperNet across multiple backbones, and shows that training on IDD yields consistent improvements on individual datasets. By releasing annotations and tools, the authors aim to catalyze progress in drone image segmentation and broader drone vision tasks. Overall, VDD and IDD offer a substantial foundation for improving generalization and performance in aerial scene understanding when labeled at the pixel level.

Abstract

Semantic segmentation of drone images is critical for various aerial vision tasks as it provides essential semantic details to understand scenes on the ground. Ensuring high accuracy of semantic segmentation models for drones requires access to diverse, large-scale, and high-resolution datasets, which are often scarce in the field of aerial image processing. While existing datasets typically focus on urban scenes and are relatively small, our Varied Drone Dataset (VDD) addresses these limitations by offering a large-scale, densely labeled collection of 400 high-resolution images spanning 7 classes. This dataset features various scenes in urban, industrial, rural, and natural areas, captured from different camera angles and under diverse lighting conditions. We also make new annotations to UDD and UAVid, integrating them under VDD annotation standards, to create the Integrated Drone Dataset (IDD). We train seven state-of-the-art models on drone datasets as baselines. It's expected that our dataset will generate considerable interest in drone image segmentation and serve as a foundation for other drone vision tasks. Datasets are publicly available at \href{our website}{https://github.com/RussRobin/VDD}.

VDD: Varied Drone Dataset for Semantic Segmentation

TL;DR

The paper addresses the scarcity of diverse, high-resolution, low-altitude drone datasets for semantic segmentation by introducing Varied Drone Dataset (VDD), a 400-image, 7-class collection with varied camera angles, scenes, and lighting. It further fuses VDD with existing datasets (UDD, UAVid) into the Integrated Drone Dataset (IDD), totaling 811 images across 7 classes, and demonstrates cross-dataset gains with state-of-the-art models. The work provides extensive baselines using Mask2Former, SegFormer, and UperNet across multiple backbones, and shows that training on IDD yields consistent improvements on individual datasets. By releasing annotations and tools, the authors aim to catalyze progress in drone image segmentation and broader drone vision tasks. Overall, VDD and IDD offer a substantial foundation for improving generalization and performance in aerial scene understanding when labeled at the pixel level.

Abstract

Semantic segmentation of drone images is critical for various aerial vision tasks as it provides essential semantic details to understand scenes on the ground. Ensuring high accuracy of semantic segmentation models for drones requires access to diverse, large-scale, and high-resolution datasets, which are often scarce in the field of aerial image processing. While existing datasets typically focus on urban scenes and are relatively small, our Varied Drone Dataset (VDD) addresses these limitations by offering a large-scale, densely labeled collection of 400 high-resolution images spanning 7 classes. This dataset features various scenes in urban, industrial, rural, and natural areas, captured from different camera angles and under diverse lighting conditions. We also make new annotations to UDD and UAVid, integrating them under VDD annotation standards, to create the Integrated Drone Dataset (IDD). We train seven state-of-the-art models on drone datasets as baselines. It's expected that our dataset will generate considerable interest in drone image segmentation and serve as a foundation for other drone vision tasks. Datasets are publicly available at \href{our website}{https://github.com/RussRobin/VDD}.
Paper Structure (20 sections, 1 equation, 9 figures, 5 tables)

This paper contains 20 sections, 1 equation, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Sample images in VDD train/val/test set. The three images provide a glimpse into the variance in VDD: they are taken in urban, rural and industrial areas respectively, and the camera angles are 30, 60 and 90 degrees.
  • Figure 2: Various scenes in VDD. From left to right, top to down: urban residence, lake, highway, highschool, canteen in university, mountains, villa zones, rural villages, transformer substation, hospital, gym and factory.
  • Figure 3: Typical scenes in Aeroscapes, ICG Drone Dataset, UAVid and UDD.
  • Figure 4: These images were taken in spring and autumn, respectively. The light conditions and vegetation ratios are changed, while the building looks the same.
  • Figure 5: The three images are taken with three camera angles at the same place, including 30, 60 and 90 (bird view) degrees.
  • ...and 4 more figures