UnSAMFlow: Unsupervised Optical Flow Guided by Segment Anything Model
Shuai Yuan, Lei Luo, Zhuo Hui, Can Pu, Xiaoyu Xiang, Rakesh Ranjan, Denis Demandolx
TL;DR
UnSAMFlow tackles occlusion and motion-boundary failures in unsupervised optical flow by integrating object-level cues from the Segment Anything Model (SAM). It introduces three SAM-based adaptations—semantic augmentation, a region-wise homography smoothness loss, and a mask feature module—to enforce object-consistent motion and robust feature aggregation, with optional SAM inputs at inference. The approach achieves state-of-the-art unsupervised results on KITTI and Sintel, demonstrates strong cross-domain generalization, and maintains efficient inference. The work highlights SAM's potential as a zero-shot, open-world semantic prior to guide low-level vision tasks like optical flow without requiring ground-truth labels.
Abstract
Traditional unsupervised optical flow methods are vulnerable to occlusions and motion boundaries due to lack of object-level information. Therefore, we propose UnSAMFlow, an unsupervised flow network that also leverages object information from the latest foundation model Segment Anything Model (SAM). We first include a self-supervised semantic augmentation module tailored to SAM masks. We also analyze the poor gradient landscapes of traditional smoothness losses and propose a new smoothness definition based on homography instead. A simple yet effective mask feature module has also been added to further aggregate features on the object level. With all these adaptations, our method produces clear optical flow estimation with sharp boundaries around objects, which outperforms state-of-the-art methods on both KITTI and Sintel datasets. Our method also generalizes well across domains and runs very efficiently.
