Table of Contents
Fetching ...

Breaking the Box: Enhancing Remote Sensing Image Segmentation with Freehand Sketches

Ying Zang, Yuncan Gao, Jiangi Zhang, Yuangi Hu, Runlong Cao, Lanyun Zhu, Qi Zhu, Deyi Ji, Renjun Xu, Tianrun Chen

TL;DR

The paper tackles the challenge of remote sensing image segmentation under extreme scale and viewpoint variability by introducing freehand sketch prompting as a more intuitive interaction than points or boxes. It introduces the LTL-Sensing dataset, pairing human sketches with remote sensing images and GT masks, and presents LTL-Net, a sketch-aware segmentation model that fuses sketch and image features and employs a masked attention mechanism and a multi-prompt transport module to robustly map multiple sketches to image regions. Empirical results show that sketch-guided prompting substantially improves segmentation accuracy and robustness over SAM and related sketch-based methods, across object sizes and scenes, highlighting the potential for more effective human-AI collaboration in environmental monitoring, disaster response, and urban analysis. Collectively, the approach advances zero-shot interactive segmentation in remote sensing by combining intuitive user input, a dedicated annotated dataset, and a novel network design that handles sketch variability through augmentation and optimal transport-based multi-prompt alignment.

Abstract

This work advances zero-shot interactive segmentation for remote sensing imagery through three key contributions. First, we propose a novel sketch-based prompting method, enabling users to intuitively outline objects, surpassing traditional point or box prompts. Second, we introduce LTL-Sensing, the first dataset pairing human sketches with remote sensing imagery, setting a benchmark for future research. Third, we present LTL-Net, a model featuring a multi-input prompting transport module tailored for freehand sketches. Extensive experiments show our approach significantly improves segmentation accuracy and robustness over state-of-the-art methods like SAM, fostering more intuitive human-AI collaboration in remote sensing analysis and enhancing its applications.

Breaking the Box: Enhancing Remote Sensing Image Segmentation with Freehand Sketches

TL;DR

The paper tackles the challenge of remote sensing image segmentation under extreme scale and viewpoint variability by introducing freehand sketch prompting as a more intuitive interaction than points or boxes. It introduces the LTL-Sensing dataset, pairing human sketches with remote sensing images and GT masks, and presents LTL-Net, a sketch-aware segmentation model that fuses sketch and image features and employs a masked attention mechanism and a multi-prompt transport module to robustly map multiple sketches to image regions. Empirical results show that sketch-guided prompting substantially improves segmentation accuracy and robustness over SAM and related sketch-based methods, across object sizes and scenes, highlighting the potential for more effective human-AI collaboration in environmental monitoring, disaster response, and urban analysis. Collectively, the approach advances zero-shot interactive segmentation in remote sensing by combining intuitive user input, a dedicated annotated dataset, and a novel network design that handles sketch variability through augmentation and optimal transport-based multi-prompt alignment.

Abstract

This work advances zero-shot interactive segmentation for remote sensing imagery through three key contributions. First, we propose a novel sketch-based prompting method, enabling users to intuitively outline objects, surpassing traditional point or box prompts. Second, we introduce LTL-Sensing, the first dataset pairing human sketches with remote sensing imagery, setting a benchmark for future research. Third, we present LTL-Net, a model featuring a multi-input prompting transport module tailored for freehand sketches. Extensive experiments show our approach significantly improves segmentation accuracy and robustness over state-of-the-art methods like SAM, fostering more intuitive human-AI collaboration in remote sensing analysis and enhancing its applications.

Paper Structure

This paper contains 18 sections, 17 equations, 8 figures, 6 tables, 2 algorithms.

Figures (8)

  • Figure 1: In this paper, we propose using human freehand sketches (drawing a rough contour of the object) to improve image segmentation in remote sensing images A) Comparison of SAM and our method with different inputs. While SAM struggles with point and box prompts, sketch input improves performance. Red: Segmentation Mask; Green: Prompt; Zoom in for better view; B) Our carefully designed LTL-Net is capable of producing more accurate segmentation masks in all mask size ranges with sketch input compared to vanilla SAM (Pre-Trained) and SAM fine-tuned on remote sensing image dataset with same input prompts, demoted as SAM (Fine-Tuned). More details in Supplementary Material.
  • Figure 1: Our carefully designed LTL-Net is capable of producing more accurate segmentation masks in all mask size ranges with sketch input compared to vanilla SAM (Pre-Trained) and SAM fine-tuned on remote sensing image dataset with same input prompts, demoted as SAM (Fine-Tuned). More details in Supplementary Material.
  • Figure 2: a) Overall structure of LTL-Net. Masked Attention, Sketch augmentation, and Multi-Prompts Transport are introduced in this sketch-based task and are used in training to get elevated performance. The GT mask is used for supervision. b) we introduce Multi-Prompts Transport (MPT).This cross-prompt coordination mechanism enhances segmentation precision while maintaining consistency across varying sketch inputs. c) Inference Phase: Input image and freehand sketch to output segmentation aligned with sketch guidance.
  • Figure 2: Partial sample display of LTL-Sensing Dataset
  • Figure 3: Visualization of sketches augmentation with different amplitude of pertubation.
  • ...and 3 more figures