Table of Contents
Fetching ...

Multi-Field De-interlacing using Deformable Convolution Residual Blocks and Self-Attention

Ronglei Ji, A. Murat Tekalp

TL;DR

The paper tackles learning-based deinterlacing by proposing a multi-field, full-frame-rate network that aligns adjacent fields to a reference field using deformable convolution residual blocks and a parallel self-attention pathway. A two-stage design first performs field alignment and fusion, then uses two separate reconstruction branches to produce progressive frames, with an indicator bit guiding frame interleaving. Key contributions include (i) introducing a parallel self-attention module alongside deformable convolutions, (ii) employing a separate reconstruction module for even/odd fields, and (iii) extensive ablation and generalization analyses showing state-of-the-art performance on multiple benchmarks. The approach achieves superior numerical and perceptual results, ranking top on the Full FrameRate Leaderboard, and offers a practical, scalable solution for high-quality deinterlacing in real-time video processing.

Abstract

Although deep learning has made significant impact on image/video restoration and super-resolution, learned deinterlacing has so far received less attention in academia or industry. This is despite deinterlacing is well-suited for supervised learning from synthetic data since the degradation model is known and fixed. In this paper, we propose a novel multi-field full frame-rate deinterlacing network, which adapts the state-of-the-art superresolution approaches to the deinterlacing task. Our model aligns features from adjacent fields to a reference field (to be deinterlaced) using both deformable convolution residual blocks and self attention. Our extensive experimental results demonstrate that the proposed method provides state-of-the-art deinterlacing results in terms of both numerical and perceptual performance. At the time of writing, our model ranks first in the Full FrameRate LeaderBoard at https://videoprocessing.ai/benchmarks/deinterlacer.html

Multi-Field De-interlacing using Deformable Convolution Residual Blocks and Self-Attention

TL;DR

The paper tackles learning-based deinterlacing by proposing a multi-field, full-frame-rate network that aligns adjacent fields to a reference field using deformable convolution residual blocks and a parallel self-attention pathway. A two-stage design first performs field alignment and fusion, then uses two separate reconstruction branches to produce progressive frames, with an indicator bit guiding frame interleaving. Key contributions include (i) introducing a parallel self-attention module alongside deformable convolutions, (ii) employing a separate reconstruction module for even/odd fields, and (iii) extensive ablation and generalization analyses showing state-of-the-art performance on multiple benchmarks. The approach achieves superior numerical and perceptual results, ranking top on the Full FrameRate Leaderboard, and offers a practical, scalable solution for high-quality deinterlacing in real-time video processing.

Abstract

Although deep learning has made significant impact on image/video restoration and super-resolution, learned deinterlacing has so far received less attention in academia or industry. This is despite deinterlacing is well-suited for supervised learning from synthetic data since the degradation model is known and fixed. In this paper, we propose a novel multi-field full frame-rate deinterlacing network, which adapts the state-of-the-art superresolution approaches to the deinterlacing task. Our model aligns features from adjacent fields to a reference field (to be deinterlaced) using both deformable convolution residual blocks and self attention. Our extensive experimental results demonstrate that the proposed method provides state-of-the-art deinterlacing results in terms of both numerical and perceptual performance. At the time of writing, our model ranks first in the Full FrameRate LeaderBoard at https://videoprocessing.ai/benchmarks/deinterlacer.html
Paper Structure (12 sections, 4 equations, 4 figures, 4 tables)

This paper contains 12 sections, 4 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: (a) The proposed deinterlacing network with five input fields $(N-2)_O, (N-1)_E, N_O, (N+1)_E, (N+2)_O$, where the reference field for alignment is $N_O$, the field to be estimated is $N_E$. (b) Top, block diagram of a standard residual block depicted by orange boxes in (a). Middle, block diagram of a DfRes block shown by the brown box in (a). Bottom, block diagram of a differential DfRes ($\Delta$DfRes) block. Fea_i represents each input field. Fea_i_out is the corresponding output feature after one DfRes ($\Delta$DfRes) block and it will replace Fea_i to be aligned through next DfRes ($\Delta$DfRes) block.
  • Figure 2: Overview of data processing during training. (a)-(b) synthetic interlaced videos are generated by extracting odd and even fields of video frames, where $N_O$ and $N_E$ represent odd and even fields of frame $N$. (c) input fields. (d)-(g) deinterlacing process.
  • Figure 3: (a) Block diagram of the self-attention module, where Softmax is defined in Equation \ref{['eq:softmax']}. (b) Block diagram of the efficient self-attention module, where Softmax_k and Softmax_q are defined in Equation \ref{['eq:twosoftmax']}.
  • Figure 4: Visual evaluation of a deinterlaced frame from Vid4 dataset. (a)-(h) Differences between the actual and reconstructed progressive frames. (i) Progressive ground truth frame. (j) Zooming in the green box. From upper left to lower right: DfRes_SA (PSNR: 37.29), DfRes (PSNR: 37.15), $\Delta$DfRes (PSNR: 37.25), EDVR (PSNR: 36.93), EDVR_woTSA (PSNR: 36.21), TDAN (PSNR: 35.41), DUF (PSNR: 36.08), zhu2017real (PSNR: 29.31), and ground truth.