CodingHomo: Bootstrapping Deep Homography With Video Coding
Yike Liu, Haipeng Li, Shuaicheng Liu, Bing Zeng
TL;DR
CodingHomo addresses unsupervised deep homography estimation under challenging motions by bootstrapping with motion vectors (MVs) derived from video coding. It introduces Mask-Guided Fusion (MGF) and Mask-Guided Homography Estimation (MGHE) to fuse MV priors into a coarse-to-fine warping framework, guided by an Enhanced Motion Mask $M_e$ computed from MVs and features. An unsupervised loss combining $\ell_{align}$, $\ell_{FIL}$, and $\ell_{plane}$ focuses learning on the dominant plane and suppresses outliers via a probabilistic MV-homography model. Empirically, CodingHomo achieves state-of-the-art performance on CA-unsup and strong generalization to GHOF, demonstrating robust, transferable homography estimation in real-world, dynamic scenes. The work highlights the practical value of compressed-domain cues for geometric estimation and provides detailed ablations and a public codebase to facilitate reproducibility.
Abstract
Homography estimation is a fundamental task in computer vision with applications in diverse fields. Recent advances in deep learning have improved homography estimation, particularly with unsupervised learning approaches, offering increased robustness and generalizability. However, accurately predicting homography, especially in complex motions, remains a challenge. In response, this work introduces a novel method leveraging video coding, particularly by harnessing inherent motion vectors (MVs) present in videos. We present CodingHomo, an unsupervised framework for homography estimation. Our framework features a Mask-Guided Fusion (MGF) module that identifies and utilizes beneficial features among the MVs, thereby enhancing the accuracy of homography prediction. Additionally, the Mask-Guided Homography Estimation (MGHE) module is presented for eliminating undesired features in the coarse-to-fine homography refinement process. CodingHomo outperforms existing state-of-the-art unsupervised methods, delivering good robustness and generalizability. The code and dataset are available at: \href{github}{https://github.com/liuyike422/CodingHomo
