Table of Contents
Fetching ...

Dynamic Dictionary Learning for Remote Sensing Image Segmentation

Xuechao Zou, Yue Li, Shun Zhang, Kai Li, Shiying Wang, Pin Tao, Junliang Xing, Congyan Lang

TL;DR

This work addresses the challenge of distinguishing morphologically similar categories in remote sensing image segmentation by introducing a dynamic dictionary learning framework that explicitly models class-aware embeddings. A static dictionary is transformed into a dynamic one via input-driven attention and refined through multi-stage alternating cross-attention between image features and dictionary embeddings, guided by a dictionary-based contrastive loss to maximize inter-class separability while minimizing intra-class variance. The approach combines an encoder, a dictionary generator (static and dynamic dictionaries), and a decoder, with a training objective that jointly optimizes static and dynamic branches and enforces discriminability. Empirical results on six datasets, including LoveDA and UAVid, demonstrate state-of-the-art performance for both coarse- and fine-grained segmentation tasks, highlighting robust generalization and practical applicability. Code availability is provided, underscoring the method's potential for real-world remote sensing applications.

Abstract

Remote sensing image segmentation faces persistent challenges in distinguishing morphologically similar categories and adapting to diverse scene variations. While existing methods rely on implicit representation learning paradigms, they often fail to dynamically adjust semantic embeddings according to contextual cues, leading to suboptimal performance in fine-grained scenarios such as cloud thickness differentiation. This work introduces a dynamic dictionary learning framework that explicitly models class ID embeddings through iterative refinement. The core contribution lies in a novel dictionary construction mechanism, where class-aware semantic embeddings are progressively updated via multi-stage alternating cross-attention querying between image features and dictionary embeddings. This process enables adaptive representation learning tailored to input-specific characteristics, effectively resolving ambiguities in intra-class heterogeneity and inter-class homogeneity. To further enhance discriminability, a contrastive constraint is applied to the dictionary space, ensuring compact intra-class distributions while maximizing inter-class separability. Extensive experiments across both coarse- and fine-grained datasets demonstrate consistent improvements over state-of-the-art methods, particularly in two online test benchmarks (LoveDA and UAVid). Code is available at https://anonymous.4open.science/r/D2LS-8267/.

Dynamic Dictionary Learning for Remote Sensing Image Segmentation

TL;DR

This work addresses the challenge of distinguishing morphologically similar categories in remote sensing image segmentation by introducing a dynamic dictionary learning framework that explicitly models class-aware embeddings. A static dictionary is transformed into a dynamic one via input-driven attention and refined through multi-stage alternating cross-attention between image features and dictionary embeddings, guided by a dictionary-based contrastive loss to maximize inter-class separability while minimizing intra-class variance. The approach combines an encoder, a dictionary generator (static and dynamic dictionaries), and a decoder, with a training objective that jointly optimizes static and dynamic branches and enforces discriminability. Empirical results on six datasets, including LoveDA and UAVid, demonstrate state-of-the-art performance for both coarse- and fine-grained segmentation tasks, highlighting robust generalization and practical applicability. Code availability is provided, underscoring the method's potential for real-world remote sensing applications.

Abstract

Remote sensing image segmentation faces persistent challenges in distinguishing morphologically similar categories and adapting to diverse scene variations. While existing methods rely on implicit representation learning paradigms, they often fail to dynamically adjust semantic embeddings according to contextual cues, leading to suboptimal performance in fine-grained scenarios such as cloud thickness differentiation. This work introduces a dynamic dictionary learning framework that explicitly models class ID embeddings through iterative refinement. The core contribution lies in a novel dictionary construction mechanism, where class-aware semantic embeddings are progressively updated via multi-stage alternating cross-attention querying between image features and dictionary embeddings. This process enables adaptive representation learning tailored to input-specific characteristics, effectively resolving ambiguities in intra-class heterogeneity and inter-class homogeneity. To further enhance discriminability, a contrastive constraint is applied to the dictionary space, ensuring compact intra-class distributions while maximizing inter-class separability. Extensive experiments across both coarse- and fine-grained datasets demonstrate consistent improvements over state-of-the-art methods, particularly in two online test benchmarks (LoveDA and UAVid). Code is available at https://anonymous.4open.science/r/D2LS-8267/.

Paper Structure

This paper contains 48 sections, 21 equations, 15 figures, 10 tables.

Figures (15)

  • Figure 1: Overall pipeline of our dynamic dictionary learning. "$\mathcal{E}$" and "$\mathcal{D}$" denote the encoder and decoder, respectively.
  • Figure 2: Overview of the proposed network architecture. Notably, we show only the dynamic branch for brevity.
  • Figure 3: t-SNE visualization of dictionary distributions across different interaction stages.
  • Figure 4: Visualized results on four coarse-grained datasets. Ground truth labels are unavailable in the LoveDA and UAVid datasets.
  • Figure 5: Visualized results on two fine-grained datasets. More visualization of comparative methods can be found in Appendix E.
  • ...and 10 more figures