Table of Contents
Fetching ...

STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery

Yansheng Li, Linlin Wang, Tingzhu Wang, Xue Yang, Junwei Luo, Qi Wang, Youming Deng, Wenbin Wang, Xian Sun, Haifeng Li, Bo Dang, Yongjun Zhang, Yi Yu, Junchi Yan

TL;DR

STAR addresses the lack of large-scale scene graph data in large-size very-high-resolution satellite imagery by introducing a dataset with over 210K objects and 400K triplets across 1,273 scenarios, annotated with oriented bounding boxes and fine-grained categories. The authors propose the context-aware cascade cognition (CAC) framework, comprising Holistic Object Detection (HOD-Net), Pair Proposal Generation (PPG) via adversarial reconstruction, and Relationship Prediction with Context-Aware Messaging (RPCM) to handle multi-scale objects, pair sparsity, and context-dependent relations. They also release an open-source toolkit with about 30 OBD and 10 SGG methods and establish a large-scale benchmark that demonstrates STAR’s value for cognitive understanding of geospatial scenes. This work enables long-range relational reasoning and fine-grained, orientation-aware analysis in SGG for geospatial applications such as traffic planning, energy monitoring, and remote sensing analytics.

Abstract

Scene graph generation (SGG) in satellite imagery (SAI) benefits promoting understanding of geospatial scenarios from perception to cognition. In SAI, objects exhibit great variations in scales and aspect ratios, and there exist rich relationships between objects (even between spatially disjoint objects), which makes it attractive to holistically conduct SGG in large-size very-high-resolution (VHR) SAI. However, there lack such SGG datasets. Due to the complexity of large-size SAI, mining triplets <subject, relationship, object> heavily relies on long-range contextual reasoning. Consequently, SGG models designed for small-size natural imagery are not directly applicable to large-size SAI. This paper constructs a large-scale dataset for SGG in large-size VHR SAI with image sizes ranging from 512 x 768 to 27,860 x 31,096 pixels, named STAR (Scene graph generaTion in lArge-size satellite imageRy), encompassing over 210K objects and over 400K triplets. To realize SGG in large-size SAI, we propose a context-aware cascade cognition (CAC) framework to understand SAI regarding object detection (OBD), pair pruning and relationship prediction for SGG. We also release a SAI-oriented SGG toolkit with about 30 OBD and 10 SGG methods which need further adaptation by our devised modules on our challenging STAR dataset. The dataset and toolkit are available at: https://linlin-dev.github.io/project/STAR.

STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery

TL;DR

STAR addresses the lack of large-scale scene graph data in large-size very-high-resolution satellite imagery by introducing a dataset with over 210K objects and 400K triplets across 1,273 scenarios, annotated with oriented bounding boxes and fine-grained categories. The authors propose the context-aware cascade cognition (CAC) framework, comprising Holistic Object Detection (HOD-Net), Pair Proposal Generation (PPG) via adversarial reconstruction, and Relationship Prediction with Context-Aware Messaging (RPCM) to handle multi-scale objects, pair sparsity, and context-dependent relations. They also release an open-source toolkit with about 30 OBD and 10 SGG methods and establish a large-scale benchmark that demonstrates STAR’s value for cognitive understanding of geospatial scenes. This work enables long-range relational reasoning and fine-grained, orientation-aware analysis in SGG for geospatial applications such as traffic planning, energy monitoring, and remote sensing analytics.

Abstract

Scene graph generation (SGG) in satellite imagery (SAI) benefits promoting understanding of geospatial scenarios from perception to cognition. In SAI, objects exhibit great variations in scales and aspect ratios, and there exist rich relationships between objects (even between spatially disjoint objects), which makes it attractive to holistically conduct SGG in large-size very-high-resolution (VHR) SAI. However, there lack such SGG datasets. Due to the complexity of large-size SAI, mining triplets <subject, relationship, object> heavily relies on long-range contextual reasoning. Consequently, SGG models designed for small-size natural imagery are not directly applicable to large-size SAI. This paper constructs a large-scale dataset for SGG in large-size VHR SAI with image sizes ranging from 512 x 768 to 27,860 x 31,096 pixels, named STAR (Scene graph generaTion in lArge-size satellite imageRy), encompassing over 210K objects and over 400K triplets. To realize SGG in large-size SAI, we propose a context-aware cascade cognition (CAC) framework to understand SAI regarding object detection (OBD), pair pruning and relationship prediction for SGG. We also release a SAI-oriented SGG toolkit with about 30 OBD and 10 SGG methods which need further adaptation by our devised modules on our challenging STAR dataset. The dataset and toolkit are available at: https://linlin-dev.github.io/project/STAR.
Paper Structure (32 sections, 10 equations, 11 figures, 7 tables)

This paper contains 32 sections, 10 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Illustration of scene graph generation (SGG) in large-size VHR satellite imagery (SAI). (a) and (c) show OBD and SGG results in large-size VHR SAI, respectively. In (d), black arrows denote semantic relationships whose prediction only depends on isolated pairs, but red arrows denote semantic relationships that as inferred from contexts.
  • Figure 2: The geographical distribution map of the sampled images from the proposed STAR dataset.
  • Figure 3: Statistics and visualization of objects (a) and relationships (b) from the STAR dataset. The relationships are color-coded to show parking statu, spatial topology and functional description, movement status, distance warning, circuit layout, construction status and emission status. Some typical objects and interesting triplets are visualized.
  • Figure 4: Interaction mapping between objects and relationships. There are eight colors in the figure, which represent the types of eight relationships: parking status (in purple), spatial topology (in blue) and functional description (in orange), movement status (in red), distance warning (in grey), circuit layout (in gold), construction status (in green) and emission status (in brown). The values on either side indicate the proportion of each object category and relationship category, respectively.
  • Figure 5: Examples of intra-class variations and inter-class similarities in relationship on the STAR dataset.
  • ...and 6 more figures