Table of Contents
Fetching ...

Segment Anything Model for Road Network Graph Extraction

Congrui Hetang, Haoru Xue, Cindy Le, Tianwei Yue, Wenping Wang, Yihui He

TL;DR

SAM-Road reframes road network graph extraction as a two-stage task that leverages a pre-trained Segment Anything Model for geometry via dense segmentation and a transformer-based topology decoder for edge inference. Vertices are derived from high-quality SAM masks using non-maximum suppression, while local subgraphs around each vertex are reasoned over with a GNN-style transformer to predict edge existence. On City-scale and SpaceNet, SAM-Road matches or surpasses state-of-the-art accuracy (TOPO and APLS) and, crucially, runs orders of magnitude faster thanks to parallelizable sliding-window inference and avoidance of heavy post-processing. The work demonstrates the potency of foundational vision models as graph learners in remote sensing, enabling rapid, large-area road graph construction with practical impact for navigation, planning, and mapping.

Abstract

We propose SAM-Road, an adaptation of the Segment Anything Model (SAM) for extracting large-scale, vectorized road network graphs from satellite imagery. To predict graph geometry, we formulate it as a dense semantic segmentation task, leveraging the inherent strengths of SAM. The image encoder of SAM is fine-tuned to produce probability masks for roads and intersections, from which the graph vertices are extracted via simple non-maximum suppression. To predict graph topology, we designed a lightweight transformer-based graph neural network, which leverages the SAM image embeddings to estimate the edge existence probabilities between vertices. Our approach directly predicts the graph vertices and edges for large regions without expensive and complex post-processing heuristics, and is capable of building complete road network graphs spanning multiple square kilometers in a matter of seconds. With its simple, straightforward, and minimalist design, SAM-Road achieves comparable accuracy with the state-of-the-art method RNGDet++, while being 40 times faster on the City-scale dataset. We thus demonstrate the power of a foundational vision model when applied to a graph learning task. The code is available at https://github.com/htcr/sam_road.

Segment Anything Model for Road Network Graph Extraction

TL;DR

SAM-Road reframes road network graph extraction as a two-stage task that leverages a pre-trained Segment Anything Model for geometry via dense segmentation and a transformer-based topology decoder for edge inference. Vertices are derived from high-quality SAM masks using non-maximum suppression, while local subgraphs around each vertex are reasoned over with a GNN-style transformer to predict edge existence. On City-scale and SpaceNet, SAM-Road matches or surpasses state-of-the-art accuracy (TOPO and APLS) and, crucially, runs orders of magnitude faster thanks to parallelizable sliding-window inference and avoidance of heavy post-processing. The work demonstrates the potency of foundational vision models as graph learners in remote sensing, enabling rapid, large-area road graph construction with practical impact for navigation, planning, and mapping.

Abstract

We propose SAM-Road, an adaptation of the Segment Anything Model (SAM) for extracting large-scale, vectorized road network graphs from satellite imagery. To predict graph geometry, we formulate it as a dense semantic segmentation task, leveraging the inherent strengths of SAM. The image encoder of SAM is fine-tuned to produce probability masks for roads and intersections, from which the graph vertices are extracted via simple non-maximum suppression. To predict graph topology, we designed a lightweight transformer-based graph neural network, which leverages the SAM image embeddings to estimate the edge existence probabilities between vertices. Our approach directly predicts the graph vertices and edges for large regions without expensive and complex post-processing heuristics, and is capable of building complete road network graphs spanning multiple square kilometers in a matter of seconds. With its simple, straightforward, and minimalist design, SAM-Road achieves comparable accuracy with the state-of-the-art method RNGDet++, while being 40 times faster on the City-scale dataset. We thus demonstrate the power of a foundational vision model when applied to a graph learning task. The code is available at https://github.com/htcr/sam_road.
Paper Structure (21 sections, 7 figures, 4 tables, 1 algorithm)

This paper contains 21 sections, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: SAM-Road effectively predicts accurate road network graphs for dense urban regions, including roads with complex and irregular shapes, bridges, and multi-lane freeways. The corresponding segmentation masks are sharp and clear.
  • Figure 2: The architecture of our approach, SAM-Road. It contains an image encoder taken from the pre-trained SAM kirillov2023segment, a geometry decoder, and a topology decoder. It directly predicts vectorized graph vertices (yellow) and edges (orange) from an input RGB satellite imagery. Better zoom-in and view in color.
  • Figure 3: Illustrating the definition of topology labels. In (a), the white dashed circle represents $R_\text{nbr}$ ; the large dot is the source node, and the smaller yellow dots are the target nodes. Orange lines are connected pairs. In (b), a few real topology samples used for training are shown. The query for one source node is shown in the same color. White lines are positive labels and pairs without lines are negative.
  • Figure 4: SAM-Road can predict the entire road network graph for arbitrarily large regions by operating in a sliding-window manner. 0-3 represent 4 overlapping windows. It first extracts the global nodes, caches the per-window embeddings, and then aggregates the per-window edge predictions.
  • Figure 5: The visualized road network graph predictions of SAM-Road and two baseline methods. Better zoom-in and view in color. Overall, SAM-Road generates highly accurate predictions. The circles highlight especially challenging spots: in the first area, SAM-Road correctly predicts the overpass structure. In the second one, SAM-Road gives superior results for the parallel freeways. The third spot shows an irregular intersection where the two baselines fail.
  • ...and 2 more figures