Position-Aware Scene-Appearance Disentanglement for Bidirectional Photoacoustic Microscopy Registration
Yiwen Wang, Jiahao Qin
TL;DR
This paper addresses bidirectional OR-PAM registration, where forward and backward scans exhibit domain shifts that undermine traditional brightness-constancy and deformation-based methods. It introduces GPEReg-Net, a deformation-free approach that disentangles a scene representation from a global appearance code and reconstructs registered outputs via Adaptive Instance Normalization, while leveraging temporal context through a Global Position Encoding module that fuses learnable and sinusoidal frame embeddings with cross-frame attention. On OR-PAM-Reg-4K, it achieves state-of-the-art SSIM and PSNR with competitive NCC, and runs at real-time speeds, albeit with some limitations on longer temporal sequences. Temporal-consistency evaluation on a 26K dataset shows strong cross-frame coherence, but reveals constraints of fixed-capacity position embeddings, motivating future work on adaptive encoding and spatially-aware appearance modeling for better robustness in longitudinal imaging.
Abstract
High-speed optical-resolution photoacoustic microscopy (OR-PAM) with bidirectional raster scanning doubles imaging speed but introduces coupled domain shift and geometric misalignment between forward and backward scan lines. Existing registration methods, constrained by brightness constancy assumptions, achieve limited alignment quality, while recent generative approaches address domain shift through complex architectures that lack temporal awareness across frames. We propose GPEReg-Net, a scene-appearance disentanglement framework that separates domain-invariant scene features from domain-specific appearance codes via Adaptive Instance Normalization (AdaIN), enabling direct image-to-image registration without explicit deformation field estimation. To exploit temporal structure in sequential acquisitions, we introduce a Global Position Encoding (GPE) module that combines learnable position embeddings with sinusoidal encoding and cross-frame attention, allowing the network to leverage context from neighboring frames for improved temporal coherence. On the OR-PAM-Reg-4K benchmark (432 test samples), GPEReg-Net achieves NCC of 0.953, SSIM of 0.932, and PSNR of 34.49dB, surpassing the state-of-the-art by 3.8% in SSIM and 1.99dB in PSNR while maintaining competitive NCC. Code is available at https://github.com/JiahaoQin/GPEReg-Net.
