FPANet: Frequency-based Video Demoireing using Frame-level Post Alignment
Gyeongrok Oh, Sungjune Kim, Heon Gu, Sang Ho Yoon, Jinkyu Kim, Sangpil Kim
TL;DR
FPANet addresses image-video demoireing by learning filters in both frequency and spatial domains and by leveraging three-frame inputs with a frame-level Post Align Module to ensure temporal consistency. The core ideas are the Frequency Selection Fusion (FSF) block, composed of the Frequency Selection Module (FSM) and Cross Scale Fusion Module (CSFM), and the Post Align Module (PAM) for robust temporal alignment. The model optimizes with a multi-term loss that includes spatial, perceptual, and frequency-domain components, and it demonstrates superior performance on the VDmoire dataset in both image and video metrics, as well as strong qualitative restoration. The results indicate that explicit handling of amplitude and phase in the frequency domain, combined with multi-scale spatial features and frame-aligned temporal cues, yields cleaner moiré removal with preserved details and color fidelity, suggesting practical impact for high-quality video demoiréing in real-world pipelines.
Abstract
Moire patterns, created by the interference between overlapping grid patterns in the pixel space, degrade the visual quality of images and videos. Therefore, removing such patterns~(demoireing) is crucial, yet remains a challenge due to their complexities in sizes and distortions. Conventional methods mainly tackle this task by only exploiting the spatial domain of the input images, limiting their capabilities in removing large-scale moire patterns. Therefore, this work proposes FPANet, an image-video demoireing network that learns filters in both frequency and spatial domains, improving the restoration quality by removing various sizes of moire patterns. To further enhance, our model takes multiple consecutive frames, learning to extract frame-invariant content features and outputting better quality temporally consistent images. We demonstrate the effectiveness of our proposed method with a publicly available large-scale dataset, observing that ours outperforms the state-of-the-art approaches in terms of image and video quality metrics and visual experience.
