Table of Contents
Fetching ...

OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation

Jisoo Jeong, Hong Cai, Risheek Garrepalli, Jamie Menjay Lin, Munawar Hayat, Fatih Porikli

TL;DR

This work proposes OCAI, a method that supports robust frame interpolation by generating intermediate video frames along-side optical flows in between and introduces a teacher-student style semi-supervised learning method on top of the interpolated frames.

Abstract

The scarcity of ground-truth labels poses one major challenge in developing optical flow estimation models that are both generalizable and robust. While current methods rely on data augmentation, they have yet to fully exploit the rich information available in labeled video sequences. We propose OCAI, a method that supports robust frame interpolation by generating intermediate video frames alongside optical flows in between. Utilizing a forward warping approach, OCAI employs occlusion awareness to resolve ambiguities in pixel values and fills in missing values by leveraging the forward-backward consistency of optical flows. Additionally, we introduce a teacher-student style semi-supervised learning method on top of the interpolated frames. Using a pair of unlabeled frames and the teacher model's predicted optical flow, we generate interpolated frames and flows to train a student model. The teacher's weights are maintained using Exponential Moving Averaging of the student. Our evaluations demonstrate perceptually superior interpolation quality and enhanced optical flow accuracy on established benchmarks such as Sintel and KITTI.

OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation

TL;DR

This work proposes OCAI, a method that supports robust frame interpolation by generating intermediate video frames along-side optical flows in between and introduces a teacher-student style semi-supervised learning method on top of the interpolated frames.

Abstract

The scarcity of ground-truth labels poses one major challenge in developing optical flow estimation models that are both generalizable and robust. While current methods rely on data augmentation, they have yet to fully exploit the rich information available in labeled video sequences. We propose OCAI, a method that supports robust frame interpolation by generating intermediate video frames alongside optical flows in between. Utilizing a forward warping approach, OCAI employs occlusion awareness to resolve ambiguities in pixel values and fills in missing values by leveraging the forward-backward consistency of optical flows. Additionally, we introduce a teacher-student style semi-supervised learning method on top of the interpolated frames. Using a pair of unlabeled frames and the teacher model's predicted optical flow, we generate interpolated frames and flows to train a student model. The teacher's weights are maintained using Exponential Moving Averaging of the student. Our evaluations demonstrate perceptually superior interpolation quality and enhanced optical flow accuracy on established benchmarks such as Sintel and KITTI.
Paper Structure (19 sections, 14 equations, 10 figures, 7 tables)

This paper contains 19 sections, 14 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Top: Many existing data augmentation approaches focus on modifying the existing frames liu2019selflowteed2020raftjeong2023distractflow. Middle: While RealFlow han2022realflow employs optical flow model to generate new frames and iteratively update the model with generated frames, it requires compute expensive steps of depth estimation and EM optimization. Bottom: Our proposed OCAI allows flexible, robust video interpolation at any intermediate time step, and leverages interpolated frames and flows to efficiently train the model in a semi-supervised setting, significantly improving optical flow estimation.
  • Figure 2: Our proposed video interpolation algorithm using occlusion and consistency-aware forward warping.
  • Figure 3: Visual illustration of deriving occlusion-aware weighting mask $M_0$. The top part shows backward warping with a moving object and static background. Ghosting effect happens since the background is occluded in $I_1$. The bottom part illustrates how we derive the foreground mask on $I_0$, by performing backward warping and removing the ghosting effect. The foreground mask is used in the forward warping when synthesizing $\widehat{V}_{t\rightarrow1}$.
  • Figure 4: Self-supervision using interpolated video frames and flows in a teacher-student learning setting. Note that the student model is trained with both the self-supervision shown in the figure and the supervision from available ground-truth data.
  • Figure 5: Video Frame Interpolation (VFI) results on KITTI. First row is the ground truth. Second to fifth rows are outputs of SOTA VFI models kong2022ifrnetlu2022videozhang2023extractingli2023amt. Sixth row is the output of using RealFlow han2022realflow for VFI. Bottom row shows our OCAI results. For each interpolated frame, we show the PSNR, SSIM, and LPIPS (using AlexNet and VGG) scores. Best scores are shown in red.
  • ...and 5 more figures