InFlux: A Benchmark for Self-Calibration of Dynamic Intrinsics of Video Cameras
Erich Liang, Roma Bhattacharjee, Sreemanti Dey, Rafael Moschopoulos, Caitlin Wang, Michel Liao, Grace Tan, Andrew Wang, Karhan Kayan, Stamatis Alexandropoulos, Jia Deng
TL;DR
InFlux addresses the challenge of dynamic camera intrinsics in videos by providing a real-world benchmark with per-frame ground-truth intrinsics, obtained by mapping lens metadata through lens-specific LUTs constructed from calibrated experiments. The authors extend Kalibr with targeted modifications for improved stability and accuracy, and they implement a robust LUT interpolation scheme to yield per-frame intrinsics without interrupting video capture. The dataset comprises 143K+ frames across 386 diverse videos, enabling evaluation of intrinsics-prediction baselines, most of which struggle with dynamic intrinsics. This benchmark and its calibration pipeline lay the groundwork for developing 3D methods that are robust to in-the-wild intrinsic variations and offer a scalable path to ground-truth evaluation. The work also provides open resources, including data and code, to accelerate progress in dynamic-intrinsics research and downstream tasks like dense 3D reconstruction and monocular depth estimation.
Abstract
Accurately tracking camera intrinsics is crucial for achieving 3D understanding from 2D video. However, most 3D algorithms assume that camera intrinsics stay constant throughout a video, which is often not true for many real-world in-the-wild videos. A major obstacle in this field is a lack of dynamic camera intrinsics benchmarks--existing benchmarks typically offer limited diversity in scene content and intrinsics variation, and none provide per-frame intrinsic changes for consecutive video frames. In this paper, we present Intrinsics in Flux (InFlux), a real-world benchmark that provides per-frame ground truth intrinsics annotations for videos with dynamic intrinsics. Compared to prior benchmarks, InFlux captures a wider range of intrinsic variations and scene diversity, featuring 143K+ annotated frames from 386 high-resolution indoor and outdoor videos with dynamic camera intrinsics. To ensure accurate per-frame intrinsics, we build a comprehensive lookup table of calibration experiments and extend the Kalibr toolbox to improve its accuracy and robustness. Using our benchmark, we evaluate existing baseline methods for predicting camera intrinsics and find that most struggle to achieve accurate predictions on videos with dynamic intrinsics. For the dataset, code, videos, and submission, please visit https://influx.cs.princeton.edu/.
