MoDA: Leveraging Motion Priors from Videos for Advancing Unsupervised Domain Adaptation in Semantic Segmentation
Fei Pan, Xu Yin, Seokju Lee, Axi Niu, Sungeui Yoon, In So Kweon
TL;DR
MoDA tackles unsupervised domain adaptation for semantic segmentation when the target domain provides unlabeled video frames. It leverages self-supervised object motion cues learned from target videos and uses them through two modules—Object Discovery Module and Semantic Mining Module—to refine pseudo labels and improve self-training. The approach disentangles object motion from ego-motion via geometric constraints and demonstrates superior performance against optical-flow baselines in both domain-adaptive video and image segmentation, while remaining compatible with other UDA methods. This motion-guided framework offers a practical way to exploit unlabeled video data for domain adaptation in segmentation.
Abstract
Unsupervised domain adaptation (UDA) has been a potent technique to handle the lack of annotations in the target domain, particularly in semantic segmentation task. This study introduces a different UDA scenarios where the target domain contains unlabeled video frames. Drawing upon recent advancements of self-supervised learning of the object motion from unlabeled videos with geometric constraint, we design a \textbf{Mo}tion-guided \textbf{D}omain \textbf{A}daptive semantic segmentation framework (MoDA). MoDA harnesses the self-supervised object motion cues to facilitate cross-domain alignment for segmentation task. First, we present an object discovery module to localize and segment target moving objects using object motion information. Then, we propose a semantic mining module that takes the object masks to refine the pseudo labels in the target domain. Subsequently, these high-quality pseudo labels are used in the self-training loop to bridge the cross-domain gap. On domain adaptive video and image segmentation experiments, MoDA shows the effectiveness utilizing object motion as guidance for domain alignment compared with optical flow information. Moreover, MoDA exhibits versatility as it can complement existing state-of-the-art UDA approaches. Code at https://github.com/feipanir/MoDA.
