Table of Contents
Fetching ...

FG-DFPN: Flow Guided Deformable Frame Prediction Network

M. Akın Yılmaz, Ahmet Bilican, A. Murat Tekalp

TL;DR

FG-DFPN tackles video frame prediction under complex motion by integrating an optical flow estimator with flow-guided deformable convolutions in a multi-scale architecture. The method jointly estimates coarse motion, warps features in feature space, and refines deformable sampling through flow-guided offsets and masks to align spatio-temporal information before reconstructing the next frame. Key contributions include the first flow-guided deformable framework for next-frame prediction, a multi-scale fusion scheme with dedicated Flow Estimator, Offset Diversity, and flow-refinement modules, and strong experimental results on eight MPEG sequences with state-of-the-art PSNR/SSIM and competitive runtimes. This approach promises high-fidelity temporal predictions for applications in autonomous systems and video processing, balancing accuracy with inference speed.

Abstract

Video frame prediction remains a fundamental challenge in computer vision with direct implications for autonomous systems, video compression, and media synthesis. We present FG-DFPN, a novel architecture that harnesses the synergy between optical flow estimation and deformable convolutions to model complex spatio-temporal dynamics. By guiding deformable sampling with motion cues, our approach addresses the limitations of fixed-kernel networks when handling diverse motion patterns. The multi-scale design enables FG-DFPN to simultaneously capture global scene transformations and local object movements with remarkable precision. Our experiments demonstrate that FG-DFPN achieves state-of-the-art performance on eight diverse MPEG test sequences, outperforming existing methods by 1dB PSNR while maintaining competitive inference speeds. The integration of motion cues with adaptive geometric transformations makes FG-DFPN a promising solution for next-generation video processing systems that require high-fidelity temporal predictions. The model and instructions to reproduce our results will be released at: https://github.com/KUIS-AI-Tekalp-Research Group/frame-prediction

FG-DFPN: Flow Guided Deformable Frame Prediction Network

TL;DR

FG-DFPN tackles video frame prediction under complex motion by integrating an optical flow estimator with flow-guided deformable convolutions in a multi-scale architecture. The method jointly estimates coarse motion, warps features in feature space, and refines deformable sampling through flow-guided offsets and masks to align spatio-temporal information before reconstructing the next frame. Key contributions include the first flow-guided deformable framework for next-frame prediction, a multi-scale fusion scheme with dedicated Flow Estimator, Offset Diversity, and flow-refinement modules, and strong experimental results on eight MPEG sequences with state-of-the-art PSNR/SSIM and competitive runtimes. This approach promises high-fidelity temporal predictions for applications in autonomous systems and video processing, balancing accuracy with inference speed.

Abstract

Video frame prediction remains a fundamental challenge in computer vision with direct implications for autonomous systems, video compression, and media synthesis. We present FG-DFPN, a novel architecture that harnesses the synergy between optical flow estimation and deformable convolutions to model complex spatio-temporal dynamics. By guiding deformable sampling with motion cues, our approach addresses the limitations of fixed-kernel networks when handling diverse motion patterns. The multi-scale design enables FG-DFPN to simultaneously capture global scene transformations and local object movements with remarkable precision. Our experiments demonstrate that FG-DFPN achieves state-of-the-art performance on eight diverse MPEG test sequences, outperforming existing methods by 1dB PSNR while maintaining competitive inference speeds. The integration of motion cues with adaptive geometric transformations makes FG-DFPN a promising solution for next-generation video processing systems that require high-fidelity temporal predictions. The model and instructions to reproduce our results will be released at: https://github.com/KUIS-AI-Tekalp-Research Group/frame-prediction

Paper Structure

This paper contains 19 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Our proposed FG-DFPN framework
  • Figure 2: Visual comparison of predicted frames for the Garden sequence. Top row shows the prediction results from: (a) DiffCNN (PSNR: 26.48, SSIM: 0.922), (b) DFPN (PSNR: 26.76, SSIM: 0.936), and (c) our proposed FG-DFPN (PSNR: 28.78, SSIM: 0.964). Bottom row (d-f) displays the corresponding error maps between the ground truth and predictions from each method, clearly demonstrating the superior performance of our approach with significantly reduced errors.