Trajectory Densification and Depth from Perspective-based Blur
Tianchen Qiu, Qirun Zhang, Jiajian He, Zhengyue Zhuge, Jiahui Xu, Yueting Chen
TL;DR
The paper tackles depth estimation and dense camera-trajectory reconstruction from perspective-based blur in monocular video without stabilizers. It introduces a joint optical-depth pipeline that uses DINOv2 features and Cotracker for video information, a Transformer-based depth estimator with window-embedding, and a vision-language dense trajectory decoder. Two-stage training (depth then trajectory) and extensive evaluations on indoor, outdoor, and synthetic datasets show state-of-the-art depth accuracy and substantially denser trajectory reconstruction than traditional SfM approaches. The approach advances monocular video understanding by extracting metric depth and dense motion cues from long-exposure blur, with potential impact on stabilization, AR, and robotics.
Abstract
In the absence of a mechanical stabilizer, the camera undergoes inevitable rotational dynamics during capturing, which induces perspective-based blur especially under long-exposure scenarios. From an optical standpoint, perspective-based blur is depth-position-dependent: objects residing at distinct spatial locations incur different blur levels even under the same imaging settings. Inspired by this, we propose a novel method that estimate metric depth by examining the blur pattern of a video stream and dense trajectory via joint optical design algorithm. Specifically, we employ off-the-shelf vision encoder and point tracker to extract video information. Then, we estimate depth map via windowed embedding and multi-window aggregation, and densify the sparse trajectory from the optical algorithm using a vision-language model. Evaluations on multiple depth datasets demonstrate that our method attains strong performance over large depth range, while maintaining favorable generalization. Relative to the real trajectory in handheld shooting settings, our optical algorithm achieves superior precision and the dense reconstruction maintains strong accuracy.
