RRSIS: Referring Remote Sensing Image Segmentation
Zhenghang Yuan, Lichao Mou, Yuansheng Hua, Xiao Xiang Zhu
TL;DR
This work defines RefSegRS to study referring remote sensing image segmentation (RRSIS) by generating pixel-level masks from SkyScapes imagery using language expressions. It analyzes the limitations of applying natural-image referring segmentation methods to RS data and introduces a language-guided cross-scale enhancement (LGCE) module built on a LAVT-style Transformer framework with a Swin backbone and BERT language encoder. The dataset provides 4,420 image-language-label triplets across 285 scenes, enabling systematic evaluation of cross-modal methods in RS contexts. LGCE fuses shallow and deep visual features under linguistic guidance to better detect small and dispersed objects, achieving notable gains over LAVT and CNN-based baselines, and the authors plan to publicly release the dataset and code to facilitate future research.
Abstract
Localizing desired objects from remote sensing images is of great use in practical applications. Referring image segmentation, which aims at segmenting out the objects to which a given expression refers, has been extensively studied in natural images. However, almost no research attention is given to this task of remote sensing imagery. Considering its potential for real-world applications, in this paper, we introduce referring remote sensing image segmentation (RRSIS) to fill in this gap and make some insightful explorations. Specifically, we create a new dataset, called RefSegRS, for this task, enabling us to evaluate different methods. Afterward, we benchmark referring image segmentation methods of natural images on the RefSegRS dataset and find that these models show limited efficacy in detecting small and scattered objects. To alleviate this issue, we propose a language-guided cross-scale enhancement (LGCE) module that utilizes linguistic features to adaptively enhance multi-scale visual features by integrating both deep and shallow features. The proposed dataset, benchmarking results, and the designed LGCE module provide insights into the design of a better RRSIS model. We will make our dataset and code publicly available.
