FAFA: Frequency-Aware Flow-Aided Self-Supervision for Underwater Object Pose Estimation
Jingyi Tang, Gu Wang, Zeyu Chen, Shengquan Li, Xiu Li, Xiangyang Ji
TL;DR
This work tackles underwater 6D pose estimation using RGB data only, addressing annotation scarcity and the sim2real domain gap. It introduces FAFA, a two-stage framework employing frequency-aware augmentation to inject target-domain style into synthetic data and flow-aided self-supervision for end-to-end domain adaptation, leveraging a teacher–student architecture with pseudo-flow labels. Key contributions include amplitude mix/dropout in the Fourier domain and a multi-level alignment strategy that couples image-level constraints with feature-level similarity and a shape-constrained optical flow for pose refinement via $f^{s \rightarrow t}$ and $f^{tea}, f^{stu}$. On ROV6D and DeepURL benchmarks, FAFA achieves state-of-the-art performance without real pose annotations, demonstrating strong practical potential for underwater robotic perception, with limitations discussed in the Appendix.
Abstract
Although methods for estimating the pose of objects in indoor scenes have achieved great success, the pose estimation of underwater objects remains challenging due to difficulties brought by the complex underwater environment, such as degraded illumination, blurring, and the substantial cost of obtaining real annotations. In response, we introduce FAFA, a Frequency-Aware Flow-Aided self-supervised framework for 6D pose estimation of unmanned underwater vehicles (UUVs). Essentially, we first train a frequency-aware flow-based pose estimator on synthetic data, where an FFT-based augmentation approach is proposed to facilitate the network in capturing domain-invariant features and target domain styles from a frequency perspective. Further, we perform self-supervised training by enforcing flow-aided multi-level consistencies to adapt it to the real-world underwater environment. Our framework relies solely on the 3D model and RGB images, alleviating the need for any real pose annotations or other-modality data like depths. We evaluate the effectiveness of FAFA on common underwater object pose benchmarks and showcase significant performance improvements compared to state-of-the-art methods. Code is available at github.com/tjy0703/FAFA.
