Table of Contents
Fetching ...

ReCon1M:A Large-scale Benchmark Dataset for Relation Comprehension in Remote Sensing Imagery

Xian Sun, Qiwei Yan, Chubo Deng, Chenglong Liu, Yi Jiang, Zhongyan Hou, Wanxuan Lu, Fanglong Yao, Xiaoyu Liu, Lingxiang Hao, Hongfeng Yu

TL;DR

ReCon1M addresses the shortage of large-scale relation-annotated benchmarks for remote sensing scene graph generation by introducing a million-scale dataset built on FAIR1M, with 21,392 high-resolution images, 859,751 object instances across 60 categories, and 1,149,342 relation triplets across 64 categories. It details object and relation category design, an oriented bounding box annotation protocol, and a rigorous annotation workflow, and evaluates object detection benchmarks as well as SGG tasks using multiple baselines and a novel EGCA-Net that leverages RelPN for efficient relation proposals and transformer-based relation prediction. EGCA-Net achieves leading results on SGDET (mR@20=14.6%, mR@100=25.7%, mR@500=35.0%), highlighting the benefits of dense RS relations and geometric-context features for robust relation reasoning. Overall, ReCon1M provides a crucial resource and methodology to advance remote sensing cognition, with practical impact on urban planning, environmental monitoring, defense, and agriculture.

Abstract

Scene Graph Generation (SGG) is a high-level visual understanding and reasoning task aimed at extracting entities (such as objects) and their interrelationships from images. Significant progress has been made in the study of SGG in natural images in recent years, but its exploration in the domain of remote sensing images remains very limited. The complex characteristics of remote sensing images necessitate higher time and manual interpretation costs for annotation compared to natural images. The lack of a large-scale public SGG benchmark is a major impediment to the advancement of SGG-related research in aerial imagery. In this paper, we introduce the first publicly available large-scale, million-level relation dataset in the field of remote sensing images which is named as ReCon1M. Specifically, our dataset is built upon Fair1M and comprises 21,392 images. It includes annotations for 859,751 object bounding boxes across 60 different categories, and 1,149,342 relation triplets across 64 categories based on these bounding boxes. We provide a detailed description of the dataset's characteristics and statistical information. We conducted two object detection tasks and three sub-tasks within SGG on this dataset, assessing the performance of mainstream methods on these tasks.

ReCon1M:A Large-scale Benchmark Dataset for Relation Comprehension in Remote Sensing Imagery

TL;DR

ReCon1M addresses the shortage of large-scale relation-annotated benchmarks for remote sensing scene graph generation by introducing a million-scale dataset built on FAIR1M, with 21,392 high-resolution images, 859,751 object instances across 60 categories, and 1,149,342 relation triplets across 64 categories. It details object and relation category design, an oriented bounding box annotation protocol, and a rigorous annotation workflow, and evaluates object detection benchmarks as well as SGG tasks using multiple baselines and a novel EGCA-Net that leverages RelPN for efficient relation proposals and transformer-based relation prediction. EGCA-Net achieves leading results on SGDET (mR@20=14.6%, mR@100=25.7%, mR@500=35.0%), highlighting the benefits of dense RS relations and geometric-context features for robust relation reasoning. Overall, ReCon1M provides a crucial resource and methodology to advance remote sensing cognition, with practical impact on urban planning, environmental monitoring, defense, and agriculture.

Abstract

Scene Graph Generation (SGG) is a high-level visual understanding and reasoning task aimed at extracting entities (such as objects) and their interrelationships from images. Significant progress has been made in the study of SGG in natural images in recent years, but its exploration in the domain of remote sensing images remains very limited. The complex characteristics of remote sensing images necessitate higher time and manual interpretation costs for annotation compared to natural images. The lack of a large-scale public SGG benchmark is a major impediment to the advancement of SGG-related research in aerial imagery. In this paper, we introduce the first publicly available large-scale, million-level relation dataset in the field of remote sensing images which is named as ReCon1M. Specifically, our dataset is built upon Fair1M and comprises 21,392 images. It includes annotations for 859,751 object bounding boxes across 60 different categories, and 1,149,342 relation triplets across 64 categories based on these bounding boxes. We provide a detailed description of the dataset's characteristics and statistical information. We conducted two object detection tasks and three sub-tasks within SGG on this dataset, assessing the performance of mainstream methods on these tasks.
Paper Structure (31 sections, 2 equations, 10 figures, 4 tables)

This paper contains 31 sections, 2 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Visualization example of the ReCon1M dataset. (a) shows the visualization of the oriented bounding box (OBB)annotations. Due to the complexity of the scene and the large number of objects and related instances, we selected regions A and B for scene graph visualization (as shown in (b)). Figure (c) shows several downstream tasks that scene graph can assist and gives a specific example of higher-order reasoning in visual question answering.
  • Figure 2: Example images of some object categories in ReCon1M.
  • Figure 3: Example instances of some relation categories in ReCon1M. A relation instance can be represented by a triplet $\langle subject, relation, object \rangle$. Each relation instance is represented by two images, where the left and right images respectively show the subject and the object, with the relation label above and the labels of the subject and object below the images
  • Figure 4: (a)The distribution of the number of instances per object category, (b)The distribution of the number of instances per relation category.
  • Figure 5: (a) The distribution of the number of object instances per image. (b) The distribution of the number of object categories per image. (c) The distribution of the number of relation instances per image. (d) The distribution of the number of relation categories per image.
  • ...and 5 more figures