URoadNet: Dual Sparse Attentive U-Net for Multiscale Road Network Extraction
Jie Song, Yue Sun, Ziyun Cai, Liang Xiao, Yawen Huang, Yefeng Zheng
TL;DR
URoadNet addresses the challenge of extracting road networks with highly sparse and irregular structures by introducing a dual sparse attention framework that decouples local connectivity from global integrality within a U-Net backbone. The two attention streams, connectivity self-attention and integrality self-attention, operate via an interleaved token-update scheme to reduce self-attention complexity from $O(N^2)$ to near $O(N)$ while preserving multiscale road information. Empirical results on Massachusetts, DeepGlobe, SpaceNet, and Large-Scale remote sensing datasets demonstrate state-of-the-art Road IoU and F1 scores, robustness to label scarcity, and strong generalization to large imagery, underscoring the method's practical impact for navigation, urban planning, and remote sensing pipelines. This approach provides a computationally feasible, high-quality solution for road-network extraction with clear benefits for real-world remote sensing applications.
Abstract
The challenges of road network segmentation demand an algorithm capable of adapting to the sparse and irregular shapes, as well as the diverse context, which often leads traditional encoding-decoding methods and simple Transformer embeddings to failure. We introduce a computationally efficient and powerful framework for elegant road-aware segmentation. Our method, called URoadNet, effectively encodes fine-grained local road connectivity and holistic global topological semantics while decoding multiscale road network information. URoadNet offers a novel alternative to the U-Net architecture by integrating connectivity attention, which can exploit intra-road interactions across multi-level sampling features with reduced computational complexity. This local interaction serves as valuable prior information for learning global interactions between road networks and the background through another integrality attention mechanism. The two forms of sparse attention are arranged alternatively and complementarily, and trained jointly, resulting in performance improvements without significant increases in computational complexity. Extensive experiments on various datasets with different resolutions, including Massachusetts, DeepGlobe, SpaceNet, and Large-Scale remote sensing images, demonstrate that URoadNet outperforms state-of-the-art techniques. Our approach represents a significant advancement in the field of road network extraction, providing a computationally feasible solution that achieves high-quality segmentation results.
