Table of Contents
Fetching ...

PathMamba: A Hybrid Mamba-Transformer for Topologically Coherent Road Segmentation in Satellite Imagery

Jules Decaestecker, Nicolas Vigne

TL;DR

PathMamba tackles the challenge of producing topologically coherent road segmentations from satellite imagery by marrying the linear-time, continuity-focused capabilities of Mamba State Space Models with the global reasoning of Vision Transformers. The four-stage backbone uses Mamba blocks in stages 1, 2, and 4, while stage 3 combines Mamba blocks with Transformer attention to fuse continuity with global context. The approach delivers state-of-the-art results on DeepGlobe and Massachusetts Roads, notably achieving the highest APLS topology scores while remaining computationally competitive, and is supported by extensive ablations validating the architectural choices. This hybrid design demonstrates that embedding topology-aware continuity directly into the backbone can yield more coherent road networks suitable for urban planning, navigation, and disaster response, with strong generalization properties evidenced by ImageNet and ADE20K experiments.

Abstract

Achieving both high accuracy and topological continuity in road segmentation from satellite imagery is a critical goal for applications ranging from urban planning to disaster response. State-of-the-art methods often rely on Vision Transformers, which excel at capturing global context, yet their quadratic complexity is a significant barrier to efficient deployment, particularly for on-board processing in resource-constrained platforms. In contrast, emerging State Space Models like Mamba offer linear-time efficiency and are inherently suited to modeling long, continuous structures. We posit that these architectures have complementary strengths. To this end, we introduce PathMamba, a novel hybrid architecture that integrates Mamba's sequential modeling with the Transformer's global reasoning. Our design strategically uses Mamba blocks to trace the continuous nature of road networks, preserving topological structure, while integrating Transformer blocks to refine features with global context. This approach yields topologically superior segmentation maps without the prohibitive scaling costs of pure attention-based models. Our experiments on the DeepGlobe Road Extraction and Massachusetts Roads datasets demonstrate that PathMamba sets a new state-of-the-art. Notably, it significantly improves topological continuity, as measured by the APLS metric, setting a new benchmark while remaining computationally competitive.

PathMamba: A Hybrid Mamba-Transformer for Topologically Coherent Road Segmentation in Satellite Imagery

TL;DR

PathMamba tackles the challenge of producing topologically coherent road segmentations from satellite imagery by marrying the linear-time, continuity-focused capabilities of Mamba State Space Models with the global reasoning of Vision Transformers. The four-stage backbone uses Mamba blocks in stages 1, 2, and 4, while stage 3 combines Mamba blocks with Transformer attention to fuse continuity with global context. The approach delivers state-of-the-art results on DeepGlobe and Massachusetts Roads, notably achieving the highest APLS topology scores while remaining computationally competitive, and is supported by extensive ablations validating the architectural choices. This hybrid design demonstrates that embedding topology-aware continuity directly into the backbone can yield more coherent road networks suitable for urban planning, navigation, and disaster response, with strong generalization properties evidenced by ImageNet and ADE20K experiments.

Abstract

Achieving both high accuracy and topological continuity in road segmentation from satellite imagery is a critical goal for applications ranging from urban planning to disaster response. State-of-the-art methods often rely on Vision Transformers, which excel at capturing global context, yet their quadratic complexity is a significant barrier to efficient deployment, particularly for on-board processing in resource-constrained platforms. In contrast, emerging State Space Models like Mamba offer linear-time efficiency and are inherently suited to modeling long, continuous structures. We posit that these architectures have complementary strengths. To this end, we introduce PathMamba, a novel hybrid architecture that integrates Mamba's sequential modeling with the Transformer's global reasoning. Our design strategically uses Mamba blocks to trace the continuous nature of road networks, preserving topological structure, while integrating Transformer blocks to refine features with global context. This approach yields topologically superior segmentation maps without the prohibitive scaling costs of pure attention-based models. Our experiments on the DeepGlobe Road Extraction and Massachusetts Roads datasets demonstrate that PathMamba sets a new state-of-the-art. Notably, it significantly improves topological continuity, as measured by the APLS metric, setting a new benchmark while remaining computationally competitive.

Paper Structure

This paper contains 39 sections, 4 equations, 5 figures, 8 tables.

Figures (5)

  • Figure 1: An example from the Massachusetts Roads dataset illustrating the challenges of road segmentation. The input satellite image (left) contains significant occlusions from vegetation and complex shadows, while the goal is to produce a precise, topologically coherent ground-truth mask (right).
  • Figure 2: Architecture of a Visual State Space (VSS) block, which forms the basis of our Mamba stages. It uses an SSM for token-mixing and an FFN for channel-mixing.
  • Figure 3: The architecture of our proposed hybrid Mamba-Transformer backbone. Stages 1, 2, and 4 consist of VSS (Mamba) blocks. Stage 3 is a hybrid stage containing a sequence of VSS blocks followed by standard Transformer (Attention) blocks to integrate continuity modeling with global context aggregation. All stages are connected to a UperNet decoder head.
  • Figure 4: Qualitative comparison of our model against Segformer and VMamba. In the second row, both baselines produce disconnected masks, whereas our model correctly identifies the continuous road network. In the third row, our model accurately segments fine-grained details under tree occlusion, which are missed by the other methods.
  • Figure 5: A challenging sample featuring a rural road with inconsistent texture. Our model (c) maintains connectivity where baselines (a, b) fail due to surface variability, resulting in a significantly higher APLS score. (White = Ground Truth, Orange = Prediction).