Table of Contents
Fetching ...

URoadNet: Dual Sparse Attentive U-Net for Multiscale Road Network Extraction

Jie Song, Yue Sun, Ziyun Cai, Liang Xiao, Yawen Huang, Yefeng Zheng

TL;DR

URoadNet addresses the challenge of extracting road networks with highly sparse and irregular structures by introducing a dual sparse attention framework that decouples local connectivity from global integrality within a U-Net backbone. The two attention streams, connectivity self-attention and integrality self-attention, operate via an interleaved token-update scheme to reduce self-attention complexity from $O(N^2)$ to near $O(N)$ while preserving multiscale road information. Empirical results on Massachusetts, DeepGlobe, SpaceNet, and Large-Scale remote sensing datasets demonstrate state-of-the-art Road IoU and F1 scores, robustness to label scarcity, and strong generalization to large imagery, underscoring the method's practical impact for navigation, urban planning, and remote sensing pipelines. This approach provides a computationally feasible, high-quality solution for road-network extraction with clear benefits for real-world remote sensing applications.

Abstract

The challenges of road network segmentation demand an algorithm capable of adapting to the sparse and irregular shapes, as well as the diverse context, which often leads traditional encoding-decoding methods and simple Transformer embeddings to failure. We introduce a computationally efficient and powerful framework for elegant road-aware segmentation. Our method, called URoadNet, effectively encodes fine-grained local road connectivity and holistic global topological semantics while decoding multiscale road network information. URoadNet offers a novel alternative to the U-Net architecture by integrating connectivity attention, which can exploit intra-road interactions across multi-level sampling features with reduced computational complexity. This local interaction serves as valuable prior information for learning global interactions between road networks and the background through another integrality attention mechanism. The two forms of sparse attention are arranged alternatively and complementarily, and trained jointly, resulting in performance improvements without significant increases in computational complexity. Extensive experiments on various datasets with different resolutions, including Massachusetts, DeepGlobe, SpaceNet, and Large-Scale remote sensing images, demonstrate that URoadNet outperforms state-of-the-art techniques. Our approach represents a significant advancement in the field of road network extraction, providing a computationally feasible solution that achieves high-quality segmentation results.

URoadNet: Dual Sparse Attentive U-Net for Multiscale Road Network Extraction

TL;DR

URoadNet addresses the challenge of extracting road networks with highly sparse and irregular structures by introducing a dual sparse attention framework that decouples local connectivity from global integrality within a U-Net backbone. The two attention streams, connectivity self-attention and integrality self-attention, operate via an interleaved token-update scheme to reduce self-attention complexity from to near while preserving multiscale road information. Empirical results on Massachusetts, DeepGlobe, SpaceNet, and Large-Scale remote sensing datasets demonstrate state-of-the-art Road IoU and F1 scores, robustness to label scarcity, and strong generalization to large imagery, underscoring the method's practical impact for navigation, urban planning, and remote sensing pipelines. This approach provides a computationally feasible, high-quality solution for road-network extraction with clear benefits for real-world remote sensing applications.

Abstract

The challenges of road network segmentation demand an algorithm capable of adapting to the sparse and irregular shapes, as well as the diverse context, which often leads traditional encoding-decoding methods and simple Transformer embeddings to failure. We introduce a computationally efficient and powerful framework for elegant road-aware segmentation. Our method, called URoadNet, effectively encodes fine-grained local road connectivity and holistic global topological semantics while decoding multiscale road network information. URoadNet offers a novel alternative to the U-Net architecture by integrating connectivity attention, which can exploit intra-road interactions across multi-level sampling features with reduced computational complexity. This local interaction serves as valuable prior information for learning global interactions between road networks and the background through another integrality attention mechanism. The two forms of sparse attention are arranged alternatively and complementarily, and trained jointly, resulting in performance improvements without significant increases in computational complexity. Extensive experiments on various datasets with different resolutions, including Massachusetts, DeepGlobe, SpaceNet, and Large-Scale remote sensing images, demonstrate that URoadNet outperforms state-of-the-art techniques. Our approach represents a significant advancement in the field of road network extraction, providing a computationally feasible solution that achieves high-quality segmentation results.

Paper Structure

This paper contains 30 sections, 8 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Speed versus accuracy for road segmentation. Each circle depicts the performance of a model in terms of frames-per-second and Road IoU accuracy on the Massachusetts dataset using an NVIDIA GeForce RTX 2080 Ti$^{\circledR}$ GPU. The circle size is proportional to the number of the parameters of the model. We plot the performance of our method in red and show the comparative segmentations at the bottom.
  • Figure 2: U-Net extensions for road structure segmentation. (a) variant of c18 based on strip convolution focuses only on finer local characteristics, while (b-d) variants can capture global characteristics by integrating recursions on network layers c10, combining self-attention with downsampling c13, and replacing skip connections with Transformer embedding c14, respectively. By contrast, we propose to decompose and integrate the learning into connectivity and integrality attentions to return both finer details and holistic semantics.
  • Figure 3: Proposed URoadNet with Dual-SA embedding. We use a convolution layer with sigmoid to compute the final segmentation map. More detail in Section III.
  • Figure 4: Illustration of our connectivity self-attention mechanism.
  • Figure 5: Comparative interpretation of the proposed connectivity self-attention (a) and integrality self-attention (b).
  • ...and 7 more figures