Rethink Predicting the Optical Flow with the Kinetics Perspective
Yuhao Cheng, Siru Zhang, Yiqiang Yan
TL;DR
This work reframes optical flow estimation from a kinetics perspective to address the high cost of dense correlation volumes and occlusion-induced artifacts. It directly predicts flow from high-level features using a Transformer-based Motion Decoder, paired with a differentiable WarpNet that jointly handles warping and occlusion. A kinetics-guided self-supervised learning strategy leverages unlabeled data through a teacher–student framework based on constant-velocity priors, enabling robust motion understanding without extensive labeling. The approach achieves strong results on Sintel and KITTI benchmarks, especially under occlusion and fast motion, while offering improved efficiency and a public code release to foster adoption. Overall, the paper demonstrates that integrating kinetics insights with self-supervision and feature-centric flow prediction yields competitive performance and practical benefits for real-world optical flow tasks.
Abstract
Optical flow estimation is one of the fundamental tasks in low-level computer vision, which describes the pixel-wise displacement and can be used in many other tasks. From the apparent aspect, the optical flow can be viewed as the correlation between the pixels in consecutive frames, so continuously refining the correlation volume can achieve an outstanding performance. However, it will make the method have a catastrophic computational complexity. Not only that, the error caused by the occlusion regions of the successive frames will be amplified through the inaccurate warp operation. These challenges can not be solved only from the apparent view, so this paper rethinks the optical flow estimation from the kinetics viewpoint.We propose a method combining the apparent and kinetics information from this motivation. The proposed method directly predicts the optical flow from the feature extracted from images instead of building the correlation volume, which will improve the efficiency of the whole network. Meanwhile, the proposed method involves a new differentiable warp operation that simultaneously considers the warping and occlusion. Moreover, the proposed method blends the kinetics feature with the apparent feature through the novel self-supervised loss function. Furthermore, comprehensive experiments and ablation studies prove that the proposed novel insight into how to predict the optical flow can achieve the better performance of the state-of-the-art methods, and in some metrics, the proposed method outperforms the correlation-based method, especially in situations containing occlusion and fast moving. The code will be public.
