Self-supervised Learning-based Reconstruction of High-resolution 4D Light Fields
Jianxin Lei, Dongze Wu, Chengcai Xu, Hongcheng Gu, Guangquan Zhou, Junhui Hou, Ping Zhou
TL;DR
This work tackles the domain gap in light-field spatial super-resolution by removing reliance on predefined degradation models. It introduces a self-supervised framework built on a hybrid LF imaging prototype, combining a central-view HR reference with LR LF inputs. The method comprises CVS-Net for central-view synthesis, BD-Net to enforce LF parallax structure, and HLFSSR-Net that fuses 2D HR guidance with 4D LF data to produce high-resolution LFs, guided by HR-aware and EPI gradient losses. Experiments on simulated and real-world hybrid data demonstrate superior reconstruction quality and robust generalization, enabling high-resolution LF output without ground-truth HR LF panels.
Abstract
Hand-held light field (LF) cameras often exhibit low spatial resolution due to the inherent trade-off between spatial and angular dimensions. Existing supervised learning-based LF spatial super-resolution (SR) methods, which rely on pre-defined image degradation models, struggle to overcome the domain gap between the training phase -- where LFs with natural resolution are used as ground truth -- and the inference phase, which aims to reconstruct higher-resolution LFs, especially when applied to real-world data.To address this challenge, this paper introduces a novel self-supervised learning-based method for LF spatial SR, which can produce higher spatial resolution LF images than originally captured ones without pre-defined image degradation models. The self-supervised method incorporates a hybrid LF imaging prototype, a real-world hybrid LF dataset, and a self-supervised LF spatial SR framework. The prototype makes reference image pairs between low-resolution central-view sub-aperture images and high-resolution (HR) images. The self-supervised framework consists of a well-designed LF spatial SR network with hybrid input, a central-view synthesis network with an HR-aware loss that enables side-view sub-aperture images to learn high-frequency information from the only HR central view reference image, and a backward degradation network with an epipolar-plane image gradient loss to preserve LF parallax structures. Extensive experiments on both simulated and real-world datasets demonstrate the significant superiority of our approach over state-of-the-art ones in reconstructing higher spatial resolution LF images without pre-defined degradation.
