SAS-Net: Scene-Appearance Separation Network for Robust Spatiotemporal Registration in Bidirectional Photoacoustic Microscopy

Jiahao Qin

SAS-Net: Scene-Appearance Separation Network for Robust Spatiotemporal Registration in Bidirectional Photoacoustic Microscopy

Jiahao Qin

TL;DR

SAS-Net addresses the core challenge of bidirectional OR-PAM artifacts by decoupling scene content from acquisition-dependent appearance through a forward-modeling framework. It encodes domain-invariant structure and domain-specific appearance, then re-renders cross-domain images to achieve precise registration, enforced by scene, cycle, and domain alignment losses. The method delivers state-of-the-art intra-frame registration (SSIM $0.894$, NCC $0.961$) and strong inter-frame stability (NCC $0.964$), with real-time inference around $11.2$ ms per frame (~89 fps). By enabling implicit, shared-structure alignment, SAS-Net improves vascular continuity metrics and supports reliable functional imaging, with broad potential applicability to other modalities exhibiting coupled domain shifts and geometric distortions.

Abstract

High-speed optical-resolution photoacoustic microscopy (OR-PAM) with bidirectional scanning enables rapid functional brain imaging but introduces severe spatiotemporal misalignment from coupled scan-direction-dependent domain shift and geometric distortion. Conventional registration methods rely on brightness constancy, an assumption violated under bidirectional scanning, leading to unreliable alignment. A unified scene-appearance separation framework is proposed to jointly address domain shift and spatial misalignment. The proposed architecture separates domain-invariant scene content from domain-specific appearance characteristics, enabling cross-domain reconstruction with geometric preservation. A scene consistency loss promotes geometric correspondence in the latent space, linking domain shift correction with spatial registration within a single framework. For in vivo mouse brain vasculature imaging, the proposed method achieves normalized cross-correlation (NCC) of 0.961 and structural similarity index (SSIM) of 0.894, substantially outperforming conventional methods. Ablation studies demonstrate that domain alignment loss is critical, with its removal causing 82% NCC reduction (0.961 to 0.175), while scene consistency and cycle consistency losses provide complementary regularization for optimal performance. The method achieves 11.2 ms inference time per frame (86 fps), substantially exceeding typical OR-PAM acquisition rates and enabling real-time processing. These results suggest that the proposed framework enables robust high-speed bidirectional OR-PAM for reliable quantitative and longitudinal functional imaging. The code will be publicly available at https://github.com/D-ST-Sword/SAS-Net

SAS-Net: Scene-Appearance Separation Network for Robust Spatiotemporal Registration in Bidirectional Photoacoustic Microscopy

TL;DR

, NCC

) and strong inter-frame stability (NCC

), with real-time inference around

ms per frame (~89 fps). By enabling implicit, shared-structure alignment, SAS-Net improves vascular continuity metrics and supports reliable functional imaging, with broad potential applicability to other modalities exhibiting coupled domain shifts and geometric distortions.

Abstract

Paper Structure (38 sections, 21 equations, 6 figures, 3 tables)

This paper contains 38 sections, 21 equations, 6 figures, 3 tables.

Introduction
Intra-frame artifacts.
Inter-frame artifacts.
Related Work
Photoacoustic Microscopy and Bidirectional Scanning
Deformable Image Registration
Unpaired Image-to-Image Translation
Joint Registration and Translation
Method
Problem Formulation: A Forward Modeling Perspective
Network Architecture: Implementing the Forward Model
Scene Encoder $E_S$.
Appearance Encoder $E_A$.
Forward Model $G$.
Cross-Domain Reconstruction as Re-Rendering.
...and 23 more sections

Figures (6)

Figure 1: Overview of our scene-appearance separation framework. (a) Model architecture and registration results: Scene Encoder $E_S$ extracts domain-invariant anatomical structure using instance normalization; Appearance Encoder $E_A$ captures domain-specific acquisition parameters via global average pooling; Forward Model $G$ synthesizes images using feature modulation layers. The scene consistency loss $\mathcal{L}_{\text{scene}}$ ensures geometric alignment between $S_{\text{odd}}$ and $S_{\text{even}}$. Left and right panels show before/after registration comparison with odd-even overlay visualization demonstrating effective column alignment. (b) Implicit inter-frame alignment through shared scene space: Our unified approach maps all frames to a shared scene space where $S_1 \approx S_2 \approx \cdots \approx S_n$ (domain-invariant). The shared Scene Encoder with InstanceNorm removes domain statistics, and the Forward Model re-renders structures with target acquisition parameters, producing domain-aligned outputs ready for registration without explicit inter-frame registration.
Figure 2: Qualitative comparison of registration results. Visual comparison on a representative frame across all eight methods. For each method: interleaved images with HOT colormap, odd-even overlay (odd in magenta, even in green, alignment as white), and grayscale visualization. Cross-frame overlay (rightmost) shows inter-frame alignment quality. Our method achieves continuous vascular structures with minimal color separation in overlays.
Figure 3: Quantitative comparison on OR-PAM-Reg-4K test set. SSIM (top) and NCC (bottom) box plots showing metric distributions across 432 test samples. SAS-Net consistently outperforms all baseline methods.
Figure 4: Ablation study results on OR-PAM-Reg-4K test set. (a) NCC and (b) SSIM performance across different configurations. Removing alignment loss causes catastrophic 82% NCC drop. All other components contribute positively, with the full model achieving the best overall performance. Error bars show standard deviation across 432 test samples.
Figure 5: ROI selection for qualitative analysis. Three representative regions (A, B, C) are selected from the interleaved image to evaluate registration quality across different vascular structures.
...and 1 more figures

SAS-Net: Scene-Appearance Separation Network for Robust Spatiotemporal Registration in Bidirectional Photoacoustic Microscopy

TL;DR

Abstract

SAS-Net: Scene-Appearance Separation Network for Robust Spatiotemporal Registration in Bidirectional Photoacoustic Microscopy

Authors

TL;DR

Abstract

Table of Contents

Figures (6)