Table of Contents
Fetching ...

Bridging the Geometry Mismatch: Frequency-Aware Anisotropic Serialization for Thin-Structure SSMs

Jin Bai, Huiyao Zhang, Qi Wen, Ningyang Li, Shengyang Li, Atta ur Rahman, Xiaolin Tian

Abstract

The segmentation of thin linear structures is inherently topology allowbreak-critical, where minor local errors can sever long-range connectivity. While recent State-Space Models (SSMs) offer efficient long-range modeling, their isotropic serialization (e.g., raster scanning) creates a geometry mismatch for anisotropic targets, causing state propagation across rather than along the structure trajectories. To address this, we propose FGOS-Net, a framework based on frequency allowbreak-geometric disentanglement. We first decompose features into a stable topology carrier and directional high-frequency bands, leveraging the latter to explicitly correct spatial misalignments induced by downsampling. Building on this calibrated topology, we introduce frequency-aligned scanning that elevates serialization to a geometry-conditioned decision, preserving direction-consistent traces. Coupled with an active probing strategy to selectively inject high-frequency details and suppress texture ambiguity, FGOS-Net consistently outperforms strong baselines across four challenging benchmarks. Notably, it achieves 91.3% mIoU and 97.1% clDice on DeepCrack while running at 80 FPS with only 7.87 GFLOPs.

Bridging the Geometry Mismatch: Frequency-Aware Anisotropic Serialization for Thin-Structure SSMs

Abstract

The segmentation of thin linear structures is inherently topology allowbreak-critical, where minor local errors can sever long-range connectivity. While recent State-Space Models (SSMs) offer efficient long-range modeling, their isotropic serialization (e.g., raster scanning) creates a geometry mismatch for anisotropic targets, causing state propagation across rather than along the structure trajectories. To address this, we propose FGOS-Net, a framework based on frequency allowbreak-geometric disentanglement. We first decompose features into a stable topology carrier and directional high-frequency bands, leveraging the latter to explicitly correct spatial misalignments induced by downsampling. Building on this calibrated topology, we introduce frequency-aligned scanning that elevates serialization to a geometry-conditioned decision, preserving direction-consistent traces. Coupled with an active probing strategy to selectively inject high-frequency details and suppress texture ambiguity, FGOS-Net consistently outperforms strong baselines across four challenging benchmarks. Notably, it achieves 91.3% mIoU and 97.1% clDice on DeepCrack while running at 80 FPS with only 7.87 GFLOPs.

Paper Structure

This paper contains 16 sections, 9 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overview of FGOS-Net. Each FGOS block performs: (1) DWT-based frequency disentanglement into topology carrier $X_{LL}$ and directional details $X_{HF}$; (2) detail-driven alignment of $X_{LL}$; (3) geometry-aligned FA-Scan for topology modeling and ASGP for topology-conditioned detail gating; (4) IDWT-based reconstruction. A parallel GFA decoder and BRM head fuse multi-scale features and refine boundaries.
  • Figure 2: FA-Block components. Left: FA-Scan assigns deterministic, sub-band-aligned serialization trajectories (Horizontal/Vertical for directional bands, Hilbert for isotropic bands). Right: The LightGate Bottleneck (LGB) employs a symmetric bottleneck ($C{\to}C/4{\to}C$) modulated by stage-adaptive gating (ECA eca / GSE senet), reducing parameters by $\approx 94\%$ vs. standard FFNs. ECA and GSE denote Efficient Channel Attention and Grouped Squeeze-and-Excitation, respectively.
  • Figure 3: Active Spectral-Geometric Probing (ASGP). The module resolves high-frequency ambiguity via: (i) Coarse Perception: Initializing probes on the topology carrier $\tilde{X}_{LL}$; (ii) Gradient-Guided Evolution: Iteratively refining probe positions ($t=0{\to}T$) via differentiable gradient ascent on the potential field $M_0$; (iii) Gating & Injection: Generating a topology-validated mask $M$ to selectively gate high-frequency details ($X_{HF} \odot M$).
  • Figure 4: Feature Response Analysis: Raster vs. FA-Scan. Feature maps are extracted from the Stage-2 encoder output (before IDWT), and profiles are sampled along the crack structure (highlighted in pink). The centerline is derived from the ground-truth skeleton for visualization purposes. Raster Scan: Features exhibit sharp signal drops when the scan path cuts across the crack, leading to fragmentation. FA-Scan: By aligning the scan trajectory with sub-band orientation, our method maintains a continuous, high-amplitude response (measured as the $\ell_2$-normalized mean channel activation), verifying the preservation of connectivity.
  • Figure 5: Qualitative comparison on challenging scenarios. We visualize predictions from representative CNN/SSM baselines against FGOS-Net (Ours). All methods are visualized on the same test samples. Red Boxes (Topology): In low-contrast regions, baselines suffer from connectivity breaks, creating fragmented masks. Ours maintains continuous traces. Green Boxes (Texture): Under heavy clutter (, water stains, oil spots), baselines yield false positives (texture leakage). Ours successfully suppresses these artifacts via topology-gated injection.