STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery
Yansheng Li, Linlin Wang, Tingzhu Wang, Xue Yang, Junwei Luo, Qi Wang, Youming Deng, Wenbin Wang, Xian Sun, Haifeng Li, Bo Dang, Yongjun Zhang, Yi Yu, Junchi Yan
TL;DR
STAR addresses the lack of large-scale scene graph data in large-size very-high-resolution satellite imagery by introducing a dataset with over 210K objects and 400K triplets across 1,273 scenarios, annotated with oriented bounding boxes and fine-grained categories. The authors propose the context-aware cascade cognition (CAC) framework, comprising Holistic Object Detection (HOD-Net), Pair Proposal Generation (PPG) via adversarial reconstruction, and Relationship Prediction with Context-Aware Messaging (RPCM) to handle multi-scale objects, pair sparsity, and context-dependent relations. They also release an open-source toolkit with about 30 OBD and 10 SGG methods and establish a large-scale benchmark that demonstrates STAR’s value for cognitive understanding of geospatial scenes. This work enables long-range relational reasoning and fine-grained, orientation-aware analysis in SGG for geospatial applications such as traffic planning, energy monitoring, and remote sensing analytics.
Abstract
Scene graph generation (SGG) in satellite imagery (SAI) benefits promoting understanding of geospatial scenarios from perception to cognition. In SAI, objects exhibit great variations in scales and aspect ratios, and there exist rich relationships between objects (even between spatially disjoint objects), which makes it attractive to holistically conduct SGG in large-size very-high-resolution (VHR) SAI. However, there lack such SGG datasets. Due to the complexity of large-size SAI, mining triplets <subject, relationship, object> heavily relies on long-range contextual reasoning. Consequently, SGG models designed for small-size natural imagery are not directly applicable to large-size SAI. This paper constructs a large-scale dataset for SGG in large-size VHR SAI with image sizes ranging from 512 x 768 to 27,860 x 31,096 pixels, named STAR (Scene graph generaTion in lArge-size satellite imageRy), encompassing over 210K objects and over 400K triplets. To realize SGG in large-size SAI, we propose a context-aware cascade cognition (CAC) framework to understand SAI regarding object detection (OBD), pair pruning and relationship prediction for SGG. We also release a SAI-oriented SGG toolkit with about 30 OBD and 10 SGG methods which need further adaptation by our devised modules on our challenging STAR dataset. The dataset and toolkit are available at: https://linlin-dev.github.io/project/STAR.
