Table of Contents
Fetching ...

SkyScenes: A Synthetic Dataset for Aerial Scene Understanding

Sahil Khose, Anisha Pal, Aayushi Agarwal, Deepanshi, Judy Hoffman, Prithvijit Chattopadhyay

TL;DR

SkyScenes is presented, a synthetic dataset of densely annotated aerial images captured from Unmanned Aerial Vehicle (UAV) perspectives that comprehensively capture diversity across layouts, weather conditions, times of day, pitch angles and altitudes with corresponding semantic, instance and depth annotations.

Abstract

Real-world aerial scene understanding is limited by a lack of datasets that contain densely annotated images curated under a diverse set of conditions. Due to inherent challenges in obtaining such images in controlled real-world settings, we present SkyScenes, a synthetic dataset of densely annotated aerial images captured from Unmanned Aerial Vehicle (UAV) perspectives. We carefully curate SkyScenes images from CARLA to comprehensively capture diversity across layouts (urban and rural maps), weather conditions, times of day, pitch angles and altitudes with corresponding semantic, instance and depth annotations. Through our experiments using SkyScenes, we show that (1) models trained on SkyScenes generalize well to different real-world scenarios, (2) augmenting training on real images with SkyScenes data can improve real-world performance, (3) controlled variations in SkyScenes can offer insights into how models respond to changes in viewpoint conditions (height and pitch), weather and time of day, and (4) incorporating additional sensor modalities (depth) can improve aerial scene understanding. Our dataset and associated generation code are publicly available at: https://hoffman-group.github.io/SkyScenes/

SkyScenes: A Synthetic Dataset for Aerial Scene Understanding

TL;DR

SkyScenes is presented, a synthetic dataset of densely annotated aerial images captured from Unmanned Aerial Vehicle (UAV) perspectives that comprehensively capture diversity across layouts, weather conditions, times of day, pitch angles and altitudes with corresponding semantic, instance and depth annotations.

Abstract

Real-world aerial scene understanding is limited by a lack of datasets that contain densely annotated images curated under a diverse set of conditions. Due to inherent challenges in obtaining such images in controlled real-world settings, we present SkyScenes, a synthetic dataset of densely annotated aerial images captured from Unmanned Aerial Vehicle (UAV) perspectives. We carefully curate SkyScenes images from CARLA to comprehensively capture diversity across layouts (urban and rural maps), weather conditions, times of day, pitch angles and altitudes with corresponding semantic, instance and depth annotations. Through our experiments using SkyScenes, we show that (1) models trained on SkyScenes generalize well to different real-world scenarios, (2) augmenting training on real images with SkyScenes data can improve real-world performance, (3) controlled variations in SkyScenes can offer insights into how models respond to changes in viewpoint conditions (height and pitch), weather and time of day, and (4) incorporating additional sensor modalities (depth) can improve aerial scene understanding. Our dataset and associated generation code are publicly available at: https://hoffman-group.github.io/SkyScenes/
Paper Structure (35 sections, 21 figures, 18 tables, 2 algorithms)

This paper contains 35 sections, 21 figures, 18 tables, 2 algorithms.

Figures (21)

  • Figure 1: SkyScenes comprises of $33.6$k aerial images curated from aerial oblique viewpoints with controlled variations facilitating reproducibility of viewpoints across different weather and daytime conditions (col $1$), different flying altitudes (col $2$) and different viewpoint pitch angles (col $3$), across different map layouts (rural and urban, col $4$) with dense pixel-level semantic, instance and depth annotations (col $5$).
  • Figure 2: Ground View $\rightarrow$ (Oblique) Aerial View. (a) The same scene viewed in Ground View vs Aerial View exhibits a significant difference in pixel proportion especially across the tail classes (vehicle, human) (b) For a subset of commonly annotated classes across CityScapes cordts2016cityscapes (red), UAVidlyu2020uavid (dark blue) , we show the percentage of pixels occupied by different classes. Aerial scenes (in UAVid) have significant under-representation of tail classes (vehicle, human).
  • Figure 3: SkyScenes w/ HumanSpawn() increases representation of humans and improves SkyScenes$\rightarrow$UAVid(S$\rightarrow$U) performance. (a) Incorporating HumanSpawn() in the image generation pipeline for SkyScenes increases the proportion of humans in snapshots ([Top]$\rightarrow$[Bottom]). (b) Increased representation of humans across all the layout variations in SkyScenes after HumanSpawn(), with the dotted line representing the proportion of humans in UAVid (c) Training on HumanSpawn (HS) SkyScenes images improves the model's ability to recognize humans (improved mIoU). T = Town.
  • Figure 4: Class-distribution Diversity in SkyScenes. We show how the distribution of densely-annotated pixels varies across different SkyScenes conditions. [Left] Class distribution varies substantially within and across urban and rural map layouts. [Right] Similarly, for the same SkyScenes layouts (and viewpoints) class distribution varies substantially across different height and pitch values.
  • Figure 6: UAVid, SkyScenes + UAVid and SkyScenes$\rightarrow$UAVid semantic segmentation predictions Predictions on randomly selected UAVid lyu2020uavid validation images by a Rein wei2024stronger model trained on UAVid and SkyScenes. Columns 1 and 2 show the original image and its ground truth. Columns 3, 4, and 5 display predictions from models trained exclusively on UAVid, jointly on SkyScenes and UAVid, and exclusively on SkyScenes, respectively.
  • ...and 16 more figures