Table of Contents
Fetching ...

Scene Prior Filtering for Depth Super-Resolution

Zhengxue Wang, Zhiqiang Yan, Ming-Hsuan Yang, Jinshan Pan, Guangwei Gao, Ying Tai, Jian Yang

TL;DR

A Scene Prior Filtering network, SPFNet, which utilizes the priors surface normal and semantic map from large-scale models and a One-to-one Prior Embedding that continuously embeds each single-modal prior into depth using Mutual Guided Filtering, further alleviating the texture interference while enhancing edges.

Abstract

Multi-modal fusion is vital to the success of super-resolution of depth maps. However, commonly used fusion strategies, such as addition and concatenation, fall short of effectively bridging the modal gap. As a result, guided image filtering methods have been introduced to mitigate this issue. Nevertheless, it is observed that their filter kernels usually encounter significant texture interference and edge inaccuracy. To tackle these two challenges, we introduce a Scene Prior Filtering network, SPFNet, which utilizes the priors surface normal and semantic map from large-scale models. Specifically, we design an All-in-one Prior Propagation that computes the similarity between multi-modal scene priors, i.e., RGB, normal, semantic, and depth, to reduce the texture interference. In addition, we present a One-to-one Prior Embedding that continuously embeds each single-modal prior into depth using Mutual Guided Filtering, further alleviating the texture interference while enhancing edges. Our SPFNet has been extensively evaluated on both real and synthetic datasets, achieving state-of-the-art performance.

Scene Prior Filtering for Depth Super-Resolution

TL;DR

A Scene Prior Filtering network, SPFNet, which utilizes the priors surface normal and semantic map from large-scale models and a One-to-one Prior Embedding that continuously embeds each single-modal prior into depth using Mutual Guided Filtering, further alleviating the texture interference while enhancing edges.

Abstract

Multi-modal fusion is vital to the success of super-resolution of depth maps. However, commonly used fusion strategies, such as addition and concatenation, fall short of effectively bridging the modal gap. As a result, guided image filtering methods have been introduced to mitigate this issue. Nevertheless, it is observed that their filter kernels usually encounter significant texture interference and edge inaccuracy. To tackle these two challenges, we introduce a Scene Prior Filtering network, SPFNet, which utilizes the priors surface normal and semantic map from large-scale models. Specifically, we design an All-in-one Prior Propagation that computes the similarity between multi-modal scene priors, i.e., RGB, normal, semantic, and depth, to reduce the texture interference. In addition, we present a One-to-one Prior Embedding that continuously embeds each single-modal prior into depth using Mutual Guided Filtering, further alleviating the texture interference while enhancing edges. Our SPFNet has been extensively evaluated on both real and synthetic datasets, achieving state-of-the-art performance.
Paper Structure (16 sections, 8 equations, 11 figures, 8 tables)

This paper contains 16 sections, 8 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Previous methods (a) directly use RGB to generate filter kernels, often encountering texture interference and inaccurate edges. In contrast, our approach (b) uses large-scale model priors to reduce interference and enhance edges, contributing to more accurate results (c) than excellent FDSR he2021towards, SUFT shi2022symmetric, and DCTNet zhao2022discrete.
  • Figure 2: Visualizations of scene priors and filter kernels. The normal (Nor.) (b) and semantic (Sem.) (c) are produced from (a) using Omnidata eftekhar2021omnidata and SAM kirillov2023segment, respectively. (e) and (f) are the filter kernels derived from DKN kim2021deformable and DAGF zhong2023deep, while (g) and (h) are from our SPFNet. (i) is the normalized kernel distribution.
  • Figure 3: SPFNet. It first produces the normal $\boldsymbol I_{n}$ and semantic $\boldsymbol I_{s}$ priors from $\boldsymbol I_{r}$ using large-scale models. Then, the scene prior branch (orange part) extracts the multi-modal features. Meanwhile, the depth branch (blue part) recursively conducts all-in-one prior propagation (APP) and one-to-one prior embedding (OPE). BI: bicubic interpolation.
  • Figure 4: Scheme of (a) All-in-one Prior Propagation (APP), and (b) histogram comparison of scene prior features.
  • Figure 5: Scheme of (a) One-to-one Prior Embedding (OPE), and (b) gradient histogram of filter kernels in the texture area (green box). The surface normal, semantic, and RGB kernels are generated by our Mutual Guided Filtering (MGF).
  • ...and 6 more figures