Table of Contents
Fetching ...

SA-MixNet: Structure-aware Mixup and Invariance Learning for Scribble-supervised Road Extraction in Remote Sensing Images

Jie Feng, Hao Huang, Junpeng Zhang, Weisheng Dong, Dingwen Zhang, Licheng Jiao

TL;DR

SA-MixNet targets the robustness gap in scribble-based road extraction by introducing a fully data-driven, structure-aware approach. It combines Statistic and Content-based Label Expansion, Structure-aware Mixup, and invariance plus connectivity regularizations to enforce consistent, topology-preserving predictions across varied scenes. Empirical results on DeepGlobe, Wuhan, and Massachusetts-road show consistent IoU gains over state-of-the-art weakly supervised and Mixup baselines, and the framework demonstrates plug-and-play compatibility with different extractors. This work advances practical road extraction under limited annotations by improving generalization, connectivity, and resilience to scene complexity.

Abstract

Mainstreamed weakly supervised road extractors rely on highly confident pseudo-labels propagated from scribbles, and their performance often degrades gradually as the image scenes tend various. We argue that such degradation is due to the poor model's invariance to scenes with different complexities, whereas existing solutions to this problem are commonly based on crafted priors that cannot be derived from scribbles. To eliminate the reliance on such priors, we propose a novel Structure-aware Mixup and Invariance Learning framework (SA-MixNet) for weakly supervised road extraction that improves the model invariance in a data-driven manner. Specifically, we design a structure-aware Mixup scheme to paste road regions from one image onto another for creating an image scene with increased complexity while preserving the road's structural integrity. Then an invariance regularization is imposed on the predictions of constructed and origin images to minimize their conflicts, which thus forces the model to behave consistently on various scenes. Moreover, a discriminator-based regularization is designed for enhancing the connectivity meanwhile preserving the structure of roads. Combining these designs, our framework demonstrates superior performance on the DeepGlobe, Wuhan, and Massachusetts datasets outperforming the state-of-the-art techniques by 1.47%, 2.12%, 4.09% respectively in IoU metrics, and showing its potential of plug-and-play. The code will be made publicly available.

SA-MixNet: Structure-aware Mixup and Invariance Learning for Scribble-supervised Road Extraction in Remote Sensing Images

TL;DR

SA-MixNet targets the robustness gap in scribble-based road extraction by introducing a fully data-driven, structure-aware approach. It combines Statistic and Content-based Label Expansion, Structure-aware Mixup, and invariance plus connectivity regularizations to enforce consistent, topology-preserving predictions across varied scenes. Empirical results on DeepGlobe, Wuhan, and Massachusetts-road show consistent IoU gains over state-of-the-art weakly supervised and Mixup baselines, and the framework demonstrates plug-and-play compatibility with different extractors. This work advances practical road extraction under limited annotations by improving generalization, connectivity, and resilience to scene complexity.

Abstract

Mainstreamed weakly supervised road extractors rely on highly confident pseudo-labels propagated from scribbles, and their performance often degrades gradually as the image scenes tend various. We argue that such degradation is due to the poor model's invariance to scenes with different complexities, whereas existing solutions to this problem are commonly based on crafted priors that cannot be derived from scribbles. To eliminate the reliance on such priors, we propose a novel Structure-aware Mixup and Invariance Learning framework (SA-MixNet) for weakly supervised road extraction that improves the model invariance in a data-driven manner. Specifically, we design a structure-aware Mixup scheme to paste road regions from one image onto another for creating an image scene with increased complexity while preserving the road's structural integrity. Then an invariance regularization is imposed on the predictions of constructed and origin images to minimize their conflicts, which thus forces the model to behave consistently on various scenes. Moreover, a discriminator-based regularization is designed for enhancing the connectivity meanwhile preserving the structure of roads. Combining these designs, our framework demonstrates superior performance on the DeepGlobe, Wuhan, and Massachusetts datasets outperforming the state-of-the-art techniques by 1.47%, 2.12%, 4.09% respectively in IoU metrics, and showing its potential of plug-and-play. The code will be made publicly available.
Paper Structure (28 sections, 13 equations, 7 figures, 7 tables)

This paper contains 28 sections, 13 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Overview of weakly supervised road extraction architectures:(I)Weakly supervised baseline with pseudo segmentation loss only, (II)Prior-driven method with additional manual prior, (III)Our proposed framework from data-driven manner without additional prior. Where $t_c$ represents the sample construction. '$\rightarrow$' means the forward operation. '$\dashrightarrow$' means backpropagation. '/' on '$\rightarrow$' means stop-gradient. $\mathcal{L}_{inv}$ means invariance regularization.
  • Figure 2: The pipeline of proposed SA-MixNet, consisting of three parts: Statistic and Content-based (SC) Label Expansion, Structure-aware Mixup (SA-Mix) based sample construction, and Invariance-based Regularization including base segmentation loss $(\mathcal{L}_{seg} \ \& \ \mathcal{L}_{seg_\text{m}})$, invariance regularization $(\mathcal{L}_{inv})$, and the connectivity regularization $(\mathcal{L}_{C-D})$. '/' on '$\rightarrow$' means stop-gradient.
  • Figure 3: The flow chart of Statistic and Content-based Label Propagation, including statistic-based expansion (annotated with blue), content-based clustering (annotated with green), and the merge of statistic-based pseudo label $\mathbf{y}_s$ and content-based pseudo label $\mathbf{y}_c$ (annotated with orange).
  • Figure 4: The visualization of different mixup methods. Methods causing image overlay are green, and the indistinguishable regions are marked by the orange box; Non-overlay methods are red, and the damaged structures are marked by the yellow box. Our proposed SA-Mix has better road structure integrity compared to other methods, generating samples with proper scenes.
  • Figure 5: The pipeline of topological connectivity discriminator. The Pseudo label$\mathbf{y}$ and Prediction$\mathbf{p}$ are filtered by the Topology filter$\mathbf{T}_{matrix}$ generated by the Pseudo label$\mathbf{y}$, then concatenated with Image$\mathbf{x}$ respectively, and input into the Discriminator.
  • ...and 2 more figures