DB SwinT: A Dual-Branch Swin Transformer Network for Road Extraction in Optical Remote Sensing Imagery

Zongyang He; Xiangli Yang; Xian Gao; Zhiguo Wang

DB SwinT: A Dual-Branch Swin Transformer Network for Road Extraction in Optical Remote Sensing Imagery

Zongyang He, Xiangli Yang, Xian Gao, Zhiguo Wang

Abstract

With the continuous improvement in the spatial resolution of optical remote sensing imagery, accurate road extraction has become increasingly important for applications such as urban planning, traffic monitoring, and disaster management. However, road extraction in complex urban and rural environments remains challenging, as roads are often occluded by trees, buildings, and other objects, leading to fragmented structures and reduced extraction accuracy. To address this problem, this paper proposes a Dual-Branch Swin Transformer network (DB SwinT) for road extraction. The proposed framework combines the long-range dependency modeling capability of the Swin Transformer with the multi-scale feature fusion strategy of U-Net, and employs a dual-branch encoder to learn complementary local and global representations. Specifically, the local branch focuses on recovering fine structural details in occluded areas, while the global branch captures broader semantic context to preserve the overall continuity of road networks. In addition, an Attentional Feature Fusion (AFF) module is introduced to adaptively fuse features from the two branches, further enhancing the representation of occluded road segments. Experimental results on the Massachusetts and DeepGlobe datasets show that DB SwinT achieves Intersection over Union (IoU) scores of 79.35\% and 74.84\%, respectively, demonstrating its effectiveness for road extraction from optical remote sensing imagery.

DB SwinT: A Dual-Branch Swin Transformer Network for Road Extraction in Optical Remote Sensing Imagery

Abstract

Paper Structure (16 sections, 11 equations, 9 figures, 3 tables)

This paper contains 16 sections, 11 equations, 9 figures, 3 tables.

Introduction
Methodology
Swin Transformer Block
Encoder
Decoder
Dual-Branch Representation
Attentional Feature Fusion
Skip Connections
Experiments
Dataset Description
Network Configuration
Evaluation Metrics
Comparative Experiments
Discussion
Conclusion
...and 1 more sections

Figures (9)

Figure 1: Overall architecture of DB SwinT, including dual-branch encoder, decoder, AFF module, and skip connections.
Figure 2: Illustration of (a) standard Transformer and (b) Swin Transformer blocks.
Figure 3: Attentional Feature Fusion with multi-scale channel attention mechanism.
Figure 4: DeepGlobe dataset. (a) Image; (b) Ground truth.
Figure 5: Massachusetts dataset. (a) Image; (b) Ground truth.
...and 4 more figures

DB SwinT: A Dual-Branch Swin Transformer Network for Road Extraction in Optical Remote Sensing Imagery

Abstract

DB SwinT: A Dual-Branch Swin Transformer Network for Road Extraction in Optical Remote Sensing Imagery

Authors

Abstract

Table of Contents

Figures (9)