BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events

Yijin Li; Yichen Shen; Zhaoyang Huang; Shuo Chen; Weikang Bian; Xiaoyu Shi; Fu-Yun Wang; Keqiang Sun; Hujun Bao; Zhaopeng Cui; Guofeng Zhang; Hongsheng Li

BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events

Yijin Li, Yichen Shen, Zhaoyang Huang, Shuo Chen, Weikang Bian, Xiaoyu Shi, Fu-Yun Wang, Keqiang Sun, Hujun Bao, Zhaopeng Cui, Guofeng Zhang, Hongsheng Li

TL;DR

BlinkVision addresses the lack of a unified benchmark for pixel-wise correspondence that combines RGB frames and event data across optical flow, scene flow, and point tracking. It introduces a photorealistic, multi-modality dataset with dense per-pixel ground truth over 410 categories, rendered with Blender and accompanied by a public leaderboard. The study shows that current image- and event-based methods struggle under large frame gaps and extreme lighting, though fine-tuning on BlinkVision improves generalization and highlights the dataset’s value for cross-modal research. By enabling category-aware analysis and cross-dataset transfer, BlinkVision promises to accelerate the development of robust, multi-modal vision systems.

Abstract

Recent advances in event-based vision suggest that these systems complement traditional cameras by providing continuous observation without frame rate limitations and a high dynamic range, making them well-suited for correspondence tasks such as optical flow and point tracking. However, there is still a lack of comprehensive benchmarks for correspondence tasks that include both event data and images. To address this gap, we propose BlinkVision, a large-scale and diverse benchmark with multiple modalities and dense correspondence annotations. BlinkVision offers several valuable features: 1) Rich modalities: It includes both event data and RGB images. 2) Extensive annotations: It provides dense per-pixel annotations covering optical flow, scene flow, and point tracking. 3) Large vocabulary: It contains 410 everyday categories, sharing common classes with popular 2D and 3D datasets like LVIS and ShapeNet. 4) Naturalistic: It delivers photorealistic data and covers various naturalistic factors, such as camera shake and deformation. BlinkVision enables extensive benchmarks on three types of correspondence tasks (optical flow, point tracking, and scene flow estimation) for both image-based and event-based methods, offering new observations, practices, and insights for future research. The benchmark website is https://www.blinkvision.net/.

BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events

TL;DR

Abstract

BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)