AUG: A New Dataset and An Efficient Model for Aerial Image Urban Scene Graph Generation
Yansheng Li, Kun Li, Yongjun Zhang, Linlin Wang, Dingwen Zhang
TL;DR
This work addresses the scarcity of overhead-view scene graphs for urban aerial imagery by introducing the AUG dataset (400 images with 25,594 objects, 16,970 relationships, and 27,175 attributes) and proposing two novel components: ABS-PRD for pruning potential relationships and a locality-preserving graph convolutional network (LPG) to retain local information while leveraging global context. The ABS-PRD analyzes category-pair statistics and adaptive bounding-box scaling to identify meaningful object pairs beyond simple IOU-based overlaps, while LPG aggregates multi-layer neighborhood information through a skip-preserving fusion of local and global features for robust relationship prediction. Experimental results show LPG with Cascade RCNN achieves state-of-the-art performance on AUG across PredCls, SGCls, and SGDet, with ABS-PRD yielding consistent advantages over IOU-PRD and attribute embeddings delivering substantial gains. This work provides a new benchmark and practical methods that advance aerial scene graph generation, with potential benefits for public safety, disaster response, and large-scale urban understanding.”
Abstract
Scene graph generation (SGG) aims to understand the visual objects and their semantic relationships from one given image. Until now, lots of SGG datasets with the eyelevel view are released but the SGG dataset with the overhead view is scarcely studied. By contrast to the object occlusion problem in the eyelevel view, which impedes the SGG, the overhead view provides a new perspective that helps to promote the SGG by providing a clear perception of the spatial relationships of objects in the ground scene. To fill in the gap of the overhead view dataset, this paper constructs and releases an aerial image urban scene graph generation (AUG) dataset. Images from the AUG dataset are captured with the low-attitude overhead view. In the AUG dataset, 25,594 objects, 16,970 relationships, and 27,175 attributes are manually annotated. To avoid the local context being overwhelmed in the complex aerial urban scene, this paper proposes one new locality-preserving graph convolutional network (LPG). Different from the traditional graph convolutional network, which has the natural advantage of capturing the global context for SGG, the convolutional layer in the LPG integrates the non-destructive initial features of the objects with dynamically updated neighborhood information to preserve the local context under the premise of mining the global context. To address the problem that there exists an extra-large number of potential object relationship pairs but only a small part of them is meaningful in AUG, we propose the adaptive bounding box scaling factor for potential relationship detection (ABS-PRD) to intelligently prune the meaningless relationship pairs. Extensive experiments on the AUG dataset show that our LPG can significantly outperform the state-of-the-art methods and the effectiveness of the proposed locality-preserving strategy.
