Table of Contents
Fetching ...

InFlux: A Benchmark for Self-Calibration of Dynamic Intrinsics of Video Cameras

Erich Liang, Roma Bhattacharjee, Sreemanti Dey, Rafael Moschopoulos, Caitlin Wang, Michel Liao, Grace Tan, Andrew Wang, Karhan Kayan, Stamatis Alexandropoulos, Jia Deng

TL;DR

InFlux addresses the challenge of dynamic camera intrinsics in videos by providing a real-world benchmark with per-frame ground-truth intrinsics, obtained by mapping lens metadata through lens-specific LUTs constructed from calibrated experiments. The authors extend Kalibr with targeted modifications for improved stability and accuracy, and they implement a robust LUT interpolation scheme to yield per-frame intrinsics without interrupting video capture. The dataset comprises 143K+ frames across 386 diverse videos, enabling evaluation of intrinsics-prediction baselines, most of which struggle with dynamic intrinsics. This benchmark and its calibration pipeline lay the groundwork for developing 3D methods that are robust to in-the-wild intrinsic variations and offer a scalable path to ground-truth evaluation. The work also provides open resources, including data and code, to accelerate progress in dynamic-intrinsics research and downstream tasks like dense 3D reconstruction and monocular depth estimation.

Abstract

Accurately tracking camera intrinsics is crucial for achieving 3D understanding from 2D video. However, most 3D algorithms assume that camera intrinsics stay constant throughout a video, which is often not true for many real-world in-the-wild videos. A major obstacle in this field is a lack of dynamic camera intrinsics benchmarks--existing benchmarks typically offer limited diversity in scene content and intrinsics variation, and none provide per-frame intrinsic changes for consecutive video frames. In this paper, we present Intrinsics in Flux (InFlux), a real-world benchmark that provides per-frame ground truth intrinsics annotations for videos with dynamic intrinsics. Compared to prior benchmarks, InFlux captures a wider range of intrinsic variations and scene diversity, featuring 143K+ annotated frames from 386 high-resolution indoor and outdoor videos with dynamic camera intrinsics. To ensure accurate per-frame intrinsics, we build a comprehensive lookup table of calibration experiments and extend the Kalibr toolbox to improve its accuracy and robustness. Using our benchmark, we evaluate existing baseline methods for predicting camera intrinsics and find that most struggle to achieve accurate predictions on videos with dynamic intrinsics. For the dataset, code, videos, and submission, please visit https://influx.cs.princeton.edu/.

InFlux: A Benchmark for Self-Calibration of Dynamic Intrinsics of Video Cameras

TL;DR

InFlux addresses the challenge of dynamic camera intrinsics in videos by providing a real-world benchmark with per-frame ground-truth intrinsics, obtained by mapping lens metadata through lens-specific LUTs constructed from calibrated experiments. The authors extend Kalibr with targeted modifications for improved stability and accuracy, and they implement a robust LUT interpolation scheme to yield per-frame intrinsics without interrupting video capture. The dataset comprises 143K+ frames across 386 diverse videos, enabling evaluation of intrinsics-prediction baselines, most of which struggle with dynamic intrinsics. This benchmark and its calibration pipeline lay the groundwork for developing 3D methods that are robust to in-the-wild intrinsic variations and offer a scalable path to ground-truth evaluation. The work also provides open resources, including data and code, to accelerate progress in dynamic-intrinsics research and downstream tasks like dense 3D reconstruction and monocular depth estimation.

Abstract

Accurately tracking camera intrinsics is crucial for achieving 3D understanding from 2D video. However, most 3D algorithms assume that camera intrinsics stay constant throughout a video, which is often not true for many real-world in-the-wild videos. A major obstacle in this field is a lack of dynamic camera intrinsics benchmarks--existing benchmarks typically offer limited diversity in scene content and intrinsics variation, and none provide per-frame intrinsic changes for consecutive video frames. In this paper, we present Intrinsics in Flux (InFlux), a real-world benchmark that provides per-frame ground truth intrinsics annotations for videos with dynamic intrinsics. Compared to prior benchmarks, InFlux captures a wider range of intrinsic variations and scene diversity, featuring 143K+ annotated frames from 386 high-resolution indoor and outdoor videos with dynamic camera intrinsics. To ensure accurate per-frame intrinsics, we build a comprehensive lookup table of calibration experiments and extend the Kalibr toolbox to improve its accuracy and robustness. Using our benchmark, we evaluate existing baseline methods for predicting camera intrinsics and find that most struggle to achieve accurate predictions on videos with dynamic intrinsics. For the dataset, code, videos, and submission, please visit https://influx.cs.princeton.edu/.

Paper Structure

This paper contains 45 sections, 22 equations, 22 figures, 1 table, 1 algorithm.

Figures (22)

  • Figure 1: A gallery of InFlux, our real-world dynamic intrinsic video benchmark with per-frame ground truth intrinsics. It consists of of 143K+ frames across 386 videos and features highly diverse scenes, camera motion, and changes in intrinsics. We perform calibration experiments to construct per-lens lookup tables (LUTs) that map lens metadata to camera intrinsics. The LUTs are applied to our benchmark videos to generate ground truth per-frame camera intrinsics.
  • Figure 2: A visual representation of our data collection process for our real-world benchmark of videos with dynamic camera intrinsics. For each video frame, we record its LFL and FD values. These are used to query the LUT for the lens used to obtain corresponding ground truth intrinsics. To construct each lens's LUT, we perform a set of board-based and drone-based calibration experiments and apply an interpolation strategy to allow for querying of novel LFL-FD pairs.
  • Figure 3: Examples of how FOV spatial footprint (FSF) varies in size depending on LFL-FD settings. For full FOV coverage and accurate 2D detections, calibration targets must scale with FSF size. We use targets of different sizes to match the range of FSF size across different LFL-FD settings.
  • Figure 4: A visualization of the different calibration experiments performed to fill in the LUT for canon17 and premista80. Because different LFL and FD combinations yield different FSF sizes, we use a variety of different calibration targets that scale with FSF size.
  • Figure 5: A visual representation of our LUT interpolation scheme, where the color of a point illustrates the relative effect of each vertex of the region it is in. In quadrilateral regions, bilinear interpolation is used, while in triangular regions, barycentric interpolation is used.
  • ...and 17 more figures