SAS-Net: Scene-Appearance Separation Network for Robust Spatiotemporal Registration in Bidirectional Photoacoustic Microscopy
Jiahao Qin
TL;DR
SAS-Net addresses the core challenge of bidirectional OR-PAM artifacts by decoupling scene content from acquisition-dependent appearance through a forward-modeling framework. It encodes domain-invariant structure and domain-specific appearance, then re-renders cross-domain images to achieve precise registration, enforced by scene, cycle, and domain alignment losses. The method delivers state-of-the-art intra-frame registration (SSIM $0.894$, NCC $0.961$) and strong inter-frame stability (NCC $0.964$), with real-time inference around $11.2$ ms per frame (~89 fps). By enabling implicit, shared-structure alignment, SAS-Net improves vascular continuity metrics and supports reliable functional imaging, with broad potential applicability to other modalities exhibiting coupled domain shifts and geometric distortions.
Abstract
High-speed optical-resolution photoacoustic microscopy (OR-PAM) with bidirectional scanning enables rapid functional brain imaging but introduces severe spatiotemporal misalignment from coupled scan-direction-dependent domain shift and geometric distortion. Conventional registration methods rely on brightness constancy, an assumption violated under bidirectional scanning, leading to unreliable alignment. A unified scene-appearance separation framework is proposed to jointly address domain shift and spatial misalignment. The proposed architecture separates domain-invariant scene content from domain-specific appearance characteristics, enabling cross-domain reconstruction with geometric preservation. A scene consistency loss promotes geometric correspondence in the latent space, linking domain shift correction with spatial registration within a single framework. For in vivo mouse brain vasculature imaging, the proposed method achieves normalized cross-correlation (NCC) of 0.961 and structural similarity index (SSIM) of 0.894, substantially outperforming conventional methods. Ablation studies demonstrate that domain alignment loss is critical, with its removal causing 82% NCC reduction (0.961 to 0.175), while scene consistency and cycle consistency losses provide complementary regularization for optimal performance. The method achieves 11.2 ms inference time per frame (86 fps), substantially exceeding typical OR-PAM acquisition rates and enabling real-time processing. These results suggest that the proposed framework enables robust high-speed bidirectional OR-PAM for reliable quantitative and longitudinal functional imaging. The code will be publicly available at https://github.com/D-ST-Sword/SAS-Net
