Table of Contents
Fetching ...

Self-supervised Learning-based Reconstruction of High-resolution 4D Light Fields

Jianxin Lei, Dongze Wu, Chengcai Xu, Hongcheng Gu, Guangquan Zhou, Junhui Hou, Ping Zhou

TL;DR

This work tackles the domain gap in light-field spatial super-resolution by removing reliance on predefined degradation models. It introduces a self-supervised framework built on a hybrid LF imaging prototype, combining a central-view HR reference with LR LF inputs. The method comprises CVS-Net for central-view synthesis, BD-Net to enforce LF parallax structure, and HLFSSR-Net that fuses 2D HR guidance with 4D LF data to produce high-resolution LFs, guided by HR-aware and EPI gradient losses. Experiments on simulated and real-world hybrid data demonstrate superior reconstruction quality and robust generalization, enabling high-resolution LF output without ground-truth HR LF panels.

Abstract

Hand-held light field (LF) cameras often exhibit low spatial resolution due to the inherent trade-off between spatial and angular dimensions. Existing supervised learning-based LF spatial super-resolution (SR) methods, which rely on pre-defined image degradation models, struggle to overcome the domain gap between the training phase -- where LFs with natural resolution are used as ground truth -- and the inference phase, which aims to reconstruct higher-resolution LFs, especially when applied to real-world data.To address this challenge, this paper introduces a novel self-supervised learning-based method for LF spatial SR, which can produce higher spatial resolution LF images than originally captured ones without pre-defined image degradation models. The self-supervised method incorporates a hybrid LF imaging prototype, a real-world hybrid LF dataset, and a self-supervised LF spatial SR framework. The prototype makes reference image pairs between low-resolution central-view sub-aperture images and high-resolution (HR) images. The self-supervised framework consists of a well-designed LF spatial SR network with hybrid input, a central-view synthesis network with an HR-aware loss that enables side-view sub-aperture images to learn high-frequency information from the only HR central view reference image, and a backward degradation network with an epipolar-plane image gradient loss to preserve LF parallax structures. Extensive experiments on both simulated and real-world datasets demonstrate the significant superiority of our approach over state-of-the-art ones in reconstructing higher spatial resolution LF images without pre-defined degradation.

Self-supervised Learning-based Reconstruction of High-resolution 4D Light Fields

TL;DR

This work tackles the domain gap in light-field spatial super-resolution by removing reliance on predefined degradation models. It introduces a self-supervised framework built on a hybrid LF imaging prototype, combining a central-view HR reference with LR LF inputs. The method comprises CVS-Net for central-view synthesis, BD-Net to enforce LF parallax structure, and HLFSSR-Net that fuses 2D HR guidance with 4D LF data to produce high-resolution LFs, guided by HR-aware and EPI gradient losses. Experiments on simulated and real-world hybrid data demonstrate superior reconstruction quality and robust generalization, enabling high-resolution LF output without ground-truth HR LF panels.

Abstract

Hand-held light field (LF) cameras often exhibit low spatial resolution due to the inherent trade-off between spatial and angular dimensions. Existing supervised learning-based LF spatial super-resolution (SR) methods, which rely on pre-defined image degradation models, struggle to overcome the domain gap between the training phase -- where LFs with natural resolution are used as ground truth -- and the inference phase, which aims to reconstruct higher-resolution LFs, especially when applied to real-world data.To address this challenge, this paper introduces a novel self-supervised learning-based method for LF spatial SR, which can produce higher spatial resolution LF images than originally captured ones without pre-defined image degradation models. The self-supervised method incorporates a hybrid LF imaging prototype, a real-world hybrid LF dataset, and a self-supervised LF spatial SR framework. The prototype makes reference image pairs between low-resolution central-view sub-aperture images and high-resolution (HR) images. The self-supervised framework consists of a well-designed LF spatial SR network with hybrid input, a central-view synthesis network with an HR-aware loss that enables side-view sub-aperture images to learn high-frequency information from the only HR central view reference image, and a backward degradation network with an epipolar-plane image gradient loss to preserve LF parallax structures. Extensive experiments on both simulated and real-world datasets demonstrate the significant superiority of our approach over state-of-the-art ones in reconstructing higher spatial resolution LF images without pre-defined degradation.
Paper Structure (37 sections, 15 equations, 9 figures, 4 tables)

This paper contains 37 sections, 15 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Schematic illustrating the key differences between our proposed method and existing methods in the training and inference phases of LF spatial SR networks. Existing methods: First, training data pairs are created through phase a, followed by training the network through phase b, and finally applying the network trained in phase b for inference in phase c. Our method (phase d): Without the need for pre-defined degradation and HR LF ground truth, the outcomes of network training and inference are both LF images in the target HR domain.
  • Figure 2: Illustration of our hybrid 4D LF imaging prototype. OBJ: object, OL: objective lens, BS: beam splitter, MLA: micro-lens array, CAM: camera, HR: 2D high spatial resolution image, LF: 4D LF image with low spatial resolution, SSR: LF spatial SR with hybrid input, HR LF: 4D LF image with higher spatial resolution.
  • Figure 3: Examples of LR LF and different resolution 2D HR image pairs from the proposed hybrid LF dataset. The resolution of LR SAI is $329\times217$ and the resolution of the corresponding 2D HR reference images can be set as from $658\times434$ to $6580\times4340$, which can be used for $2\times$ to $20\times$ SR.
  • Figure 4: Illustration of our self-supervised LF spatial SR framework. A $5\times5$ LF is used as an example for illustration.
  • Figure 5: Visual comparisons for $2\times$ SR on simulated dataset. The PSNR and SSIM scores achieved by different methods on the presented scenes are reported below the zoom-in regions, and the horizontal EPIs are shown. The colored grid on the top-left or top-right corner of each SAI indicates its angular position.
  • ...and 4 more figures