ReCon1M:A Large-scale Benchmark Dataset for Relation Comprehension in Remote Sensing Imagery
Xian Sun, Qiwei Yan, Chubo Deng, Chenglong Liu, Yi Jiang, Zhongyan Hou, Wanxuan Lu, Fanglong Yao, Xiaoyu Liu, Lingxiang Hao, Hongfeng Yu
TL;DR
ReCon1M addresses the shortage of large-scale relation-annotated benchmarks for remote sensing scene graph generation by introducing a million-scale dataset built on FAIR1M, with 21,392 high-resolution images, 859,751 object instances across 60 categories, and 1,149,342 relation triplets across 64 categories. It details object and relation category design, an oriented bounding box annotation protocol, and a rigorous annotation workflow, and evaluates object detection benchmarks as well as SGG tasks using multiple baselines and a novel EGCA-Net that leverages RelPN for efficient relation proposals and transformer-based relation prediction. EGCA-Net achieves leading results on SGDET (mR@20=14.6%, mR@100=25.7%, mR@500=35.0%), highlighting the benefits of dense RS relations and geometric-context features for robust relation reasoning. Overall, ReCon1M provides a crucial resource and methodology to advance remote sensing cognition, with practical impact on urban planning, environmental monitoring, defense, and agriculture.
Abstract
Scene Graph Generation (SGG) is a high-level visual understanding and reasoning task aimed at extracting entities (such as objects) and their interrelationships from images. Significant progress has been made in the study of SGG in natural images in recent years, but its exploration in the domain of remote sensing images remains very limited. The complex characteristics of remote sensing images necessitate higher time and manual interpretation costs for annotation compared to natural images. The lack of a large-scale public SGG benchmark is a major impediment to the advancement of SGG-related research in aerial imagery. In this paper, we introduce the first publicly available large-scale, million-level relation dataset in the field of remote sensing images which is named as ReCon1M. Specifically, our dataset is built upon Fair1M and comprises 21,392 images. It includes annotations for 859,751 object bounding boxes across 60 different categories, and 1,149,342 relation triplets across 64 categories based on these bounding boxes. We provide a detailed description of the dataset's characteristics and statistical information. We conducted two object detection tasks and three sub-tasks within SGG on this dataset, assessing the performance of mainstream methods on these tasks.
