Table of Contents
Fetching ...

Robust Depth Enhancement via Polarization Prompt Fusion Tuning

Kei Ikemura, Yiming Huang, Felix Heide, Zhaoxiang Zhang, Qifeng Chen, Chenyang Lei

TL;DR

The paper tackles the challenge of unreliable depth measurements from diverse sensors in difficult scenarios (e.g., transparency, reflections). It introduces a learning-based framework that uses polarization cues as dense geometry guidance and fuses them with coarse sensor depth via a pre-trained RGB backbone, guided by a novel Polarization Prompt Fusion Block (PPFB) implemented through Polarization Prompt Fusion Tuning (PPFT). The key contributions include the first end-to-end general polarization-guided depth enhancement across sensor types, a cross-modal transfer learning strategy to leverage large RGB-D datasets, and a fusion mechanism that robustly integrates polarization information into deep networks, validated on the HAMMER dataset with consistent improvements over strong baselines. The approach also demonstrates versatility by extending PPFT to Shape-from-Polarization, indicating broad potential for polarization-based depth and geometry tasks with practical impact for autonomous systems and 3D reconstruction.

Abstract

Existing depth sensors are imperfect and may provide inaccurate depth values in challenging scenarios, such as in the presence of transparent or reflective objects. In this work, we present a general framework that leverages polarization imaging to improve inaccurate depth measurements from various depth sensors. Previous polarization-based depth enhancement methods focus on utilizing pure physics-based formulas for a single sensor. In contrast, our method first adopts a learning-based strategy where a neural network is trained to estimate a dense and complete depth map from polarization data and a sensor depth map from different sensors. To further improve the performance, we propose a Polarization Prompt Fusion Tuning (PPFT) strategy to effectively utilize RGB-based models pre-trained on large-scale datasets, as the size of the polarization dataset is limited to train a strong model from scratch. We conducted extensive experiments on a public dataset, and the results demonstrate that the proposed method performs favorably compared to existing depth enhancement baselines. Code and demos are available at https://lastbasket.github.io/PPFT/.

Robust Depth Enhancement via Polarization Prompt Fusion Tuning

TL;DR

The paper tackles the challenge of unreliable depth measurements from diverse sensors in difficult scenarios (e.g., transparency, reflections). It introduces a learning-based framework that uses polarization cues as dense geometry guidance and fuses them with coarse sensor depth via a pre-trained RGB backbone, guided by a novel Polarization Prompt Fusion Block (PPFB) implemented through Polarization Prompt Fusion Tuning (PPFT). The key contributions include the first end-to-end general polarization-guided depth enhancement across sensor types, a cross-modal transfer learning strategy to leverage large RGB-D datasets, and a fusion mechanism that robustly integrates polarization information into deep networks, validated on the HAMMER dataset with consistent improvements over strong baselines. The approach also demonstrates versatility by extending PPFT to Shape-from-Polarization, indicating broad potential for polarization-based depth and geometry tasks with practical impact for autonomous systems and 3D reconstruction.

Abstract

Existing depth sensors are imperfect and may provide inaccurate depth values in challenging scenarios, such as in the presence of transparent or reflective objects. In this work, we present a general framework that leverages polarization imaging to improve inaccurate depth measurements from various depth sensors. Previous polarization-based depth enhancement methods focus on utilizing pure physics-based formulas for a single sensor. In contrast, our method first adopts a learning-based strategy where a neural network is trained to estimate a dense and complete depth map from polarization data and a sensor depth map from different sensors. To further improve the performance, we propose a Polarization Prompt Fusion Tuning (PPFT) strategy to effectively utilize RGB-based models pre-trained on large-scale datasets, as the size of the polarization dataset is limited to train a strong model from scratch. We conducted extensive experiments on a public dataset, and the results demonstrate that the proposed method performs favorably compared to existing depth enhancement baselines. Code and demos are available at https://lastbasket.github.io/PPFT/.
Paper Structure (23 sections, 8 equations, 8 figures, 6 tables)

This paper contains 23 sections, 8 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Visualization of the results of our proposed framework. The sensor depth shown is from a d-ToF sensor. Our method leverages the dense shape cues from polarization and produces accurate results on challenging depth enhancement problems, including depth in-painting, restoring depth on transparent surfaces, shape correction, etc. For example, see that the depth of the transparent water bottle is restored. Note that on the error map of the sensor depth, we clamp values to the maximum scale used.
  • Figure 2: Visualization of our evaluation data by hammer. One can observe varying types of data degradation in each depth sensor type. For example, in (f) we can see missing measurements at the table surface, due to insufficient texture. Similarly, as the dataset provides all sensor depth mapped to the same camera frame, in (h) we observe a region of unknown depth at the boundary. This arises from the smaller Field-of-View (FoV) of the i-ToF sensor. In this work, we propose a general framework to resolve multiple types of sensor depth degradation.
  • Figure 3: Polarization Prompt Fusion Tuning (PPFT). We fuse polarization embeddings to the features extracted from pre-trained layers sequentially using our proposed Polarization Prompt Fusion Block (PPFB). Specifically, polarization features are passed into our PPFB as prompt $\mathbf{M}$, and features from the pre-trained foundation as the input $\mathbf{X}$. Both are then updated and passed into the next set of pre-trained encoder and our PPFB respectively.
  • Figure 4: Channel Fovea Operation in Proposed PPFB. The summation of the input features is transformed to have doubled channel size with an MLP, and then distilled to a list of weights by global average pooling lin2013network. These weights are separated into two sets and multiplied with the two inputs, respectively. Output attention weights are computed via a softmax operation on the sum of the re-weighted inputs.
  • Figure 5: Comparison between our approach and baselines lin2022dynamicyoumin2023completionformer using point cloud visualization. In addition to restoring challenging irregularities (e.g. the transparent bottle highlighted in row 2), we also observe a higher degree of regularity (e.g. the left-bottom corner highlighted in row 2, which is misaligned in the sensor depth, resulting in blank point clouds) using our proposed method, showing strong surface geometry information provided by polarization.
  • ...and 3 more figures