Table of Contents
Fetching ...

AUG: A New Dataset and An Efficient Model for Aerial Image Urban Scene Graph Generation

Yansheng Li, Kun Li, Yongjun Zhang, Linlin Wang, Dingwen Zhang

TL;DR

This work addresses the scarcity of overhead-view scene graphs for urban aerial imagery by introducing the AUG dataset (400 images with 25,594 objects, 16,970 relationships, and 27,175 attributes) and proposing two novel components: ABS-PRD for pruning potential relationships and a locality-preserving graph convolutional network (LPG) to retain local information while leveraging global context. The ABS-PRD analyzes category-pair statistics and adaptive bounding-box scaling to identify meaningful object pairs beyond simple IOU-based overlaps, while LPG aggregates multi-layer neighborhood information through a skip-preserving fusion of local and global features for robust relationship prediction. Experimental results show LPG with Cascade RCNN achieves state-of-the-art performance on AUG across PredCls, SGCls, and SGDet, with ABS-PRD yielding consistent advantages over IOU-PRD and attribute embeddings delivering substantial gains. This work provides a new benchmark and practical methods that advance aerial scene graph generation, with potential benefits for public safety, disaster response, and large-scale urban understanding.”

Abstract

Scene graph generation (SGG) aims to understand the visual objects and their semantic relationships from one given image. Until now, lots of SGG datasets with the eyelevel view are released but the SGG dataset with the overhead view is scarcely studied. By contrast to the object occlusion problem in the eyelevel view, which impedes the SGG, the overhead view provides a new perspective that helps to promote the SGG by providing a clear perception of the spatial relationships of objects in the ground scene. To fill in the gap of the overhead view dataset, this paper constructs and releases an aerial image urban scene graph generation (AUG) dataset. Images from the AUG dataset are captured with the low-attitude overhead view. In the AUG dataset, 25,594 objects, 16,970 relationships, and 27,175 attributes are manually annotated. To avoid the local context being overwhelmed in the complex aerial urban scene, this paper proposes one new locality-preserving graph convolutional network (LPG). Different from the traditional graph convolutional network, which has the natural advantage of capturing the global context for SGG, the convolutional layer in the LPG integrates the non-destructive initial features of the objects with dynamically updated neighborhood information to preserve the local context under the premise of mining the global context. To address the problem that there exists an extra-large number of potential object relationship pairs but only a small part of them is meaningful in AUG, we propose the adaptive bounding box scaling factor for potential relationship detection (ABS-PRD) to intelligently prune the meaningless relationship pairs. Extensive experiments on the AUG dataset show that our LPG can significantly outperform the state-of-the-art methods and the effectiveness of the proposed locality-preserving strategy.

AUG: A New Dataset and An Efficient Model for Aerial Image Urban Scene Graph Generation

TL;DR

This work addresses the scarcity of overhead-view scene graphs for urban aerial imagery by introducing the AUG dataset (400 images with 25,594 objects, 16,970 relationships, and 27,175 attributes) and proposing two novel components: ABS-PRD for pruning potential relationships and a locality-preserving graph convolutional network (LPG) to retain local information while leveraging global context. The ABS-PRD analyzes category-pair statistics and adaptive bounding-box scaling to identify meaningful object pairs beyond simple IOU-based overlaps, while LPG aggregates multi-layer neighborhood information through a skip-preserving fusion of local and global features for robust relationship prediction. Experimental results show LPG with Cascade RCNN achieves state-of-the-art performance on AUG across PredCls, SGCls, and SGDet, with ABS-PRD yielding consistent advantages over IOU-PRD and attribute embeddings delivering substantial gains. This work provides a new benchmark and practical methods that advance aerial scene graph generation, with potential benefits for public safety, disaster response, and large-scale urban understanding.”

Abstract

Scene graph generation (SGG) aims to understand the visual objects and their semantic relationships from one given image. Until now, lots of SGG datasets with the eyelevel view are released but the SGG dataset with the overhead view is scarcely studied. By contrast to the object occlusion problem in the eyelevel view, which impedes the SGG, the overhead view provides a new perspective that helps to promote the SGG by providing a clear perception of the spatial relationships of objects in the ground scene. To fill in the gap of the overhead view dataset, this paper constructs and releases an aerial image urban scene graph generation (AUG) dataset. Images from the AUG dataset are captured with the low-attitude overhead view. In the AUG dataset, 25,594 objects, 16,970 relationships, and 27,175 attributes are manually annotated. To avoid the local context being overwhelmed in the complex aerial urban scene, this paper proposes one new locality-preserving graph convolutional network (LPG). Different from the traditional graph convolutional network, which has the natural advantage of capturing the global context for SGG, the convolutional layer in the LPG integrates the non-destructive initial features of the objects with dynamically updated neighborhood information to preserve the local context under the premise of mining the global context. To address the problem that there exists an extra-large number of potential object relationship pairs but only a small part of them is meaningful in AUG, we propose the adaptive bounding box scaling factor for potential relationship detection (ABS-PRD) to intelligently prune the meaningless relationship pairs. Extensive experiments on the AUG dataset show that our LPG can significantly outperform the state-of-the-art methods and the effectiveness of the proposed locality-preserving strategy.
Paper Structure (16 sections, 6 equations, 8 figures, 9 tables, 2 algorithms)

This paper contains 16 sections, 6 equations, 8 figures, 9 tables, 2 algorithms.

Figures (8)

  • Figure 1: The opportunities and challenges of the AUG task. (a) denotes one image with the eyelevel view from the VG dataset. (b) denotes one image with the overhead view from the AUG dataset. (c) shows the reason why the local context is easily overwhelmed in the AUG task.
  • Figure 2: Quantity distribution of objects, relationships, and attributes on AUG. (a) is the quantity distribution of objects. (b) is the quantity distribution of relationships. (c) is the quantity distribution of attributes.
  • Figure 3: Annotation categories for objects, relationships, and attributes on the AUG dataset. (a) shows the categories of objects, we divide the objects into area-wide classes and independent individual objects, and the individual objects are divided according to their own properties. (b) shows the categories of relationship predicates according to grammatical rules in the category of relations, which allows adding more regular relational predicates according to the rules. (c) The categories of attributes are divided into nine aspects: shape, size, color, texture, texture, space, plant description, state, and material.
  • Figure 4: Visualization of annotations of the AUG dataset. (a) and (c) show the original image and its corresponding scene graph. (b) and (d) show the details of the red areas in (a) and (c). The colored nodes and the gray nodes indicate objects and attributes, respectively, and the directed edges between the nodes indicate that a relationship exists between the two nodes. A pointing to B indicates that A is the subject and B is the object. Specifically, an attribute node pointing to an object node indicates that this object has this attribute.
  • Figure 5: Four kinds of bounding boxes about ABS.
  • ...and 3 more figures