OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images
Ziyue Huang, Yongchao Feng, Shuai Yang, Ziqi Liu, Qingjie Liu, Yunhong Wang
TL;DR
OpenRSD tackles the generalization gap in remote sensing object detection by introducing an open-prompt detector that operates with multimodal prompts and two specialized heads for fast alignment and deep fusion. A three-stage training pipeline and a large ORSD+ dataset enable strong cross-domain performance across seven public RS datasets, handling both oriented and horizontal bounding boxes with real-time inference (~$20.8$ FPS). The method leverages SkyCLIP and DINOv2 prompts, offline prompt dictionaries, and class embeddings to balance vocabulary scalability with precision. Empirical results show OpenRSD outperforms state-of-the-art baselines in OBB tasks and remains competitive with high-precision methods in HBB tasks while offering substantial speed advantages, validating its practical utility for large-scale RS image analysis.
Abstract
Remote sensing object detection has made significant progress, but most studies still focus on closed-set detection, limiting generalization across diverse datasets. Open-vocabulary object detection (OVD) provides a solution by leveraging multimodal associations between text prompts and visual features. However, existing OVD methods for remote sensing (RS) images are constrained by small-scale datasets and fail to address the unique challenges of remote sensing interpretation, include oriented object detection and the need for both high precision and real-time performance in diverse scenarios. To tackle these challenges, we propose OpenRSD, a universal open-prompt RS object detection framework. OpenRSD supports multimodal prompts and integrates multi-task detection heads to balance accuracy and real-time requirements. Additionally, we design a multi-stage training pipeline to enhance the generalization of model. Evaluated on seven public datasets, OpenRSD demonstrates superior performance in oriented and horizontal bounding box detection, with real-time inference capabilities suitable for large-scale RS image analysis. Compared to YOLO-World, OpenRSD exhibits an 8.7\% higher average precision and achieves an inference speed of 20.8 FPS. Codes and models will be released.
