Table of Contents
Fetching ...

Revisiting Disparity from Dual-Pixel Images: Physics-Informed Lightweight Depth Estimation

Teppei Kurita, Yuhi Kondo, Legong Sun, Takayuki Sasaki, Sho Nitta, Yasuhiro Hashimoto, Yoshinori Muramatsu, Yusuke Moriuchi

TL;DR

A lightweight disparity estimation method based on a completion-based network that explicitly constrains disparity and learns the physical and systemic disparity properties of DP, which achieves state-of-the-art results while reducing the overall system size to 1/5 of that of the conventional method.

Abstract

In this study, we propose a high-performance disparity (depth) estimation method using dual-pixel (DP) images with few parameters. Conventional end-to-end deep-learning methods have many parameters but do not fully exploit disparity constraints, which limits their performance. Therefore, we propose a lightweight disparity estimation method based on a completion-based network that explicitly constrains disparity and learns the physical and systemic disparity properties of DP. By modeling the DP-specific disparity error parametrically and using it for sampling during training, the network acquires the unique properties of DP and enhances robustness. This learning also allows us to use a common RGB-D dataset for training without a DP dataset, which is labor-intensive to acquire. Furthermore, we propose a non-learning-based refinement framework that efficiently handles inherent disparity expansion errors by appropriately refining the confidence map of the network output. As a result, the proposed method achieved state-of-the-art results while reducing the overall system size to 1/5 of that of the conventional method, even without using the DP dataset for training, thereby demonstrating its effectiveness. The code and dataset are available on our project site.

Revisiting Disparity from Dual-Pixel Images: Physics-Informed Lightweight Depth Estimation

TL;DR

A lightweight disparity estimation method based on a completion-based network that explicitly constrains disparity and learns the physical and systemic disparity properties of DP, which achieves state-of-the-art results while reducing the overall system size to 1/5 of that of the conventional method.

Abstract

In this study, we propose a high-performance disparity (depth) estimation method using dual-pixel (DP) images with few parameters. Conventional end-to-end deep-learning methods have many parameters but do not fully exploit disparity constraints, which limits their performance. Therefore, we propose a lightweight disparity estimation method based on a completion-based network that explicitly constrains disparity and learns the physical and systemic disparity properties of DP. By modeling the DP-specific disparity error parametrically and using it for sampling during training, the network acquires the unique properties of DP and enhances robustness. This learning also allows us to use a common RGB-D dataset for training without a DP dataset, which is labor-intensive to acquire. Furthermore, we propose a non-learning-based refinement framework that efficiently handles inherent disparity expansion errors by appropriately refining the confidence map of the network output. As a result, the proposed method achieved state-of-the-art results while reducing the overall system size to 1/5 of that of the conventional method, even without using the DP dataset for training, thereby demonstrating its effectiveness. The code and dataset are available on our project site.

Paper Structure

This paper contains 20 sections, 8 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Overview of our method. (a) Conventional end-to-end disparity or depth estimation from dual-pixel (DP) images is redundant and limited in performance because it does not exploit disparity constraints explicitly. The proposed method efficiently estimates disparity using a completion network trained to consider errors during template matching and a framework that refines the expansion of disparity regions that occur in principle. (b) shows the number of parameters on the horizontal axis and accuracy on the vertical axis; the closer to the lower left, the better the performance balance. The proposed method is lightweight and achieves high performance.
  • Figure 2: PSF differences between traditional and DP sensors. (a) With traditional sensors, the shape of the PSF is the same in both far and near scenes, and the same defocus blur occurs. (b) With the DP sensor, the PSF is shaded in the left and right halves of the left and right images, and its shape is inverted between the far and near scenes. This characteristic enables the DP sensor to calculate disparity by itself.
  • Figure 3: Left and right images in a DP sensor. The disparity generated by the DP sensor is small; however, the defocus blur is large. The PSF shape is different for left/right and perspective, making it more difficult to calculate disparity than when using a stereo camera.
  • Figure 4: Simulation of the difference between stereo and DP disparity calculation errors in the toy experiment. Histograms of disparity were obtained by TM for stereo and DP images using a random dot chart, respectively. In stereo, the disparity is calculated accurately for most pixels; however, in DP, the disparity has many errors owing to the unique defocus blur.
  • Figure 5: Framework for learning disparity completion using RGBD data while incorporating the disparity properties of DP. After converting the ground truth from depth to disparity and extracting only the edge portions, input disparity reflecting DP properties is generated by sampling to achieve supervised learning.
  • ...and 6 more figures