Depth estimation from 4D light field videos
Takahiro Kinoshita, Satoshi Ono
TL;DR
This work tackles depth estimation from 4D light field videos by leveraging temporal information that static LF methods overlook. It introduces an end-to-end model that combines two-stream 3D CNNs for spatial-angular feature extraction with CLSTM for temporal modeling, trained on a Sintel-based synthetic 4D LFV dataset. Experiments on synthetic and real LFVs show that incorporating temporal information improves depth estimation, especially in noisy regions, and the method outperforms a baseline lacking temporal cues. The authors provide a synthetic dataset and code to facilitate further research, highlighting the practical impact of depth-from-4D-LFV in challenging imaging conditions.
Abstract
Depth (disparity) estimation from 4D Light Field (LF) images has been a research topic for the last couple of years. Most studies have focused on depth estimation from static 4D LF images while not considering temporal information, i.e., LF videos. This paper proposes an end-to-end neural network architecture for depth estimation from 4D LF videos. This study also constructs a medium-scale synthetic 4D LF video dataset that can be used for training deep learning-based methods. Experimental results using synthetic and real-world 4D LF videos show that temporal information contributes to the improvement of depth estimation accuracy in noisy regions. Dataset and code is available at: https://mediaeng-lfv.github.io/LFV_Disparity_Estimation
