Table of Contents
Fetching ...

LAVIB: A Large-scale Video Interpolation Benchmark

Alexandros Stergiou

TL;DR

A LArge-scale Video Interpolation Benchmark (LAVIB) is introduced for the low-level video task of Video Frame Interpolation (VFI), which comprises a large collection of high-resolution videos sourced from the web through an automated pipeline with minimal requirements for human verification.

Abstract

This paper introduces a LArge-scale Video Interpolation Benchmark (LAVIB) for the low-level video task of Video Frame Interpolation (VFI). LAVIB comprises a large collection of high-resolution videos sourced from the web through an automated pipeline with minimal requirements for human verification. Metrics are computed for each video's motion magnitudes, luminance conditions, frame sharpness, and contrast. The collection of videos and the creation of quantitative challenges based on these metrics are under-explored by current low-level video task datasets. In total, LAVIB includes 283K clips from 17K ultra-HD videos, covering 77.6 hours. Benchmark train, val, and test sets maintain similar video metric distributions. Further splits are also created for out-of-distribution (OOD) challenges, with train and test splits including videos of dissimilar attributes.

LAVIB: A Large-scale Video Interpolation Benchmark

TL;DR

A LArge-scale Video Interpolation Benchmark (LAVIB) is introduced for the low-level video task of Video Frame Interpolation (VFI), which comprises a large collection of high-resolution videos sourced from the web through an automated pipeline with minimal requirements for human verification.

Abstract

This paper introduces a LArge-scale Video Interpolation Benchmark (LAVIB) for the low-level video task of Video Frame Interpolation (VFI). LAVIB comprises a large collection of high-resolution videos sourced from the web through an automated pipeline with minimal requirements for human verification. Metrics are computed for each video's motion magnitudes, luminance conditions, frame sharpness, and contrast. The collection of videos and the creation of quantitative challenges based on these metrics are under-explored by current low-level video task datasets. In total, LAVIB includes 283K clips from 17K ultra-HD videos, covering 77.6 hours. Benchmark train, val, and test sets maintain similar video metric distributions. Further splits are also created for out-of-distribution (OOD) challenges, with train and test splits including videos of dissimilar attributes.
Paper Structure (22 sections, 2 equations, 13 figures, 24 tables, 1 algorithm)

This paper contains 22 sections, 2 equations, 13 figures, 24 tables, 1 algorithm.

Figures (13)

  • Figure 1: LAVIB videos distributed across metrics. Four metrics are computed per video. Average Flow Magnitude (AFM) quantifies motion [1]. The Average Laplacian Variance (ALV) is used to describe the sharpness of frames [1]. The Average Root Mean Square (ARMS) is used for contrast [1]. The Average Relevant Luminance (ARL) relates to the video brightness [1]. The four aforementioned metrics are used for Out-Of-Distribution (ODD) challenges: Fast [1]$\rightarrow$ slow [1]and slow [1]$\rightarrow$ fast [1]motions. Low [1]$\rightarrow$ high [1]and high [1]$\rightarrow$ low [1]sharpness. Low [1]$\rightarrow$ high [1]and high [1]$\rightarrow$ low [1]contrast. Bright [1]$\rightarrow$ dark [1]and dark [1]$\rightarrow$ bright [1]luminance.
  • Figure 2: LAVIB segment selection and challenges pipeline. Candidate 10-second clips are sampled from a long video based on their embedding similarity. Dense optical flow is computed with huang2022flowformer and spatially averaged for the AFM metric. The 1-second clips with the top-20% AFM are selected for the next step. Clips are further partitioned into four tubelets used in the final dataset based on their ARL, ALV, ARMS, and AFM. The metrics are also used for video selection in OOD challenges for a. motion, b. sharpness, c. contrast, d. luminance.
  • Figure 3: Examples from the LAVIB benchmark and OOD test sets (best viewed digitally). Zoomed regions on the right of each frame show interpolations with RIFE, EMA-VFI, and FLAVR. The top row shows results for videos from the benchmark test split. The bottom two rows are video frames from test splits from OOD challenges. The challenge is denoted at the top right of each ground truth frame. The ground truth is shown as a reference at the top left of the zoomed-in region grid.
  • Figure A\fpeval1-0: ALV distributions for all LAVIB and videos from activities, misc, and camera queries.
  • Figure A\fpeval2-0: ARL distributions for all LAVIB and videos from activities, misc, and camera queries.
  • ...and 8 more figures