Table of Contents
Fetching ...

SDL-MVS: View Space and Depth Deformable Learning Paradigm for Multi-View Stereo Reconstruction in Remote Sensing

Yong-Qiang Mao, Hanbo Bi, Liangyu Xu, Kaiqiang Chen, Zhirui Wang, Xian Sun, Kun Fu

TL;DR

This paper tackles depth estimation in large-scale remote sensing multi-view stereo by addressing occlusion and uneven brightness across views. It introduces SDL-MVS, a view-space and depth deformable learning paradigm that combines Progressive Space Deformable Sampling (PSS) with Depth Hypothesis Deformable Discretization (DHD) to deformably sample features across 3D frustum and 2D image spaces and to adapt depth priors through deformable range and interval discretization. The method delivers state-of-the-art results on LuoJia-MVS and WHU datasets, achieving low MAE (e.g., ~0.086 m for 3 views on LuoJia-MVS) and high accuracy across <0.6 m and <3-interval metrics, for both 3-view and 5-view inputs. The work demonstrates strong improvements in both quantitative metrics and qualitative reconstructions, emphasizing robust performance under occlusion and illumination variations, with practical implications for large-scale urban 3D mapping and remote sensing applications.

Abstract

Research on multi-view stereo based on remote sensing images has promoted the development of large-scale urban 3D reconstruction. However, remote sensing multi-view image data suffers from the problems of occlusion and uneven brightness between views during acquisition, which leads to the problem of blurred details in depth estimation. To solve the above problem, we re-examine the deformable learning method in the Multi-View Stereo task and propose a novel paradigm based on view Space and Depth deformable Learning (SDL-MVS), aiming to learn deformable interactions of features in different view spaces and deformably model the depth ranges and intervals to enable high accurate depth estimation. Specifically, to solve the problem of view noise caused by occlusion and uneven brightness, we propose a Progressive Space deformable Sampling (PSS) mechanism, which performs deformable learning of sampling points in the 3D frustum space and the 2D image space in a progressive manner to embed source features to the reference feature adaptively. To further optimize the depth, we introduce Depth Hypothesis deformable Discretization (DHD), which achieves precise positioning of the depth prior by adaptively adjusting the depth range hypothesis and performing deformable discretization of the depth interval hypothesis. Finally, our SDL-MVS achieves explicit modeling of occlusion and uneven brightness faced in multi-view stereo through the deformable learning paradigm of view space and depth, achieving accurate multi-view depth estimation. Extensive experiments on LuoJia-MVS and WHU datasets show that our SDL-MVS reaches state-of-the-art performance. It is worth noting that our SDL-MVS achieves an MAE error of 0.086, an accuracy of 98.9% for <0.6m, and 98.9% for <3-interval on the LuoJia-MVS dataset under the premise of three views as input.

SDL-MVS: View Space and Depth Deformable Learning Paradigm for Multi-View Stereo Reconstruction in Remote Sensing

TL;DR

This paper tackles depth estimation in large-scale remote sensing multi-view stereo by addressing occlusion and uneven brightness across views. It introduces SDL-MVS, a view-space and depth deformable learning paradigm that combines Progressive Space Deformable Sampling (PSS) with Depth Hypothesis Deformable Discretization (DHD) to deformably sample features across 3D frustum and 2D image spaces and to adapt depth priors through deformable range and interval discretization. The method delivers state-of-the-art results on LuoJia-MVS and WHU datasets, achieving low MAE (e.g., ~0.086 m for 3 views on LuoJia-MVS) and high accuracy across <0.6 m and <3-interval metrics, for both 3-view and 5-view inputs. The work demonstrates strong improvements in both quantitative metrics and qualitative reconstructions, emphasizing robust performance under occlusion and illumination variations, with practical implications for large-scale urban 3D mapping and remote sensing applications.

Abstract

Research on multi-view stereo based on remote sensing images has promoted the development of large-scale urban 3D reconstruction. However, remote sensing multi-view image data suffers from the problems of occlusion and uneven brightness between views during acquisition, which leads to the problem of blurred details in depth estimation. To solve the above problem, we re-examine the deformable learning method in the Multi-View Stereo task and propose a novel paradigm based on view Space and Depth deformable Learning (SDL-MVS), aiming to learn deformable interactions of features in different view spaces and deformably model the depth ranges and intervals to enable high accurate depth estimation. Specifically, to solve the problem of view noise caused by occlusion and uneven brightness, we propose a Progressive Space deformable Sampling (PSS) mechanism, which performs deformable learning of sampling points in the 3D frustum space and the 2D image space in a progressive manner to embed source features to the reference feature adaptively. To further optimize the depth, we introduce Depth Hypothesis deformable Discretization (DHD), which achieves precise positioning of the depth prior by adaptively adjusting the depth range hypothesis and performing deformable discretization of the depth interval hypothesis. Finally, our SDL-MVS achieves explicit modeling of occlusion and uneven brightness faced in multi-view stereo through the deformable learning paradigm of view space and depth, achieving accurate multi-view depth estimation. Extensive experiments on LuoJia-MVS and WHU datasets show that our SDL-MVS reaches state-of-the-art performance. It is worth noting that our SDL-MVS achieves an MAE error of 0.086, an accuracy of 98.9% for <0.6m, and 98.9% for <3-interval on the LuoJia-MVS dataset under the premise of three views as input.
Paper Structure (29 sections, 14 equations, 10 figures, 6 tables)

This paper contains 29 sections, 14 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: The phenomenon of occlusion and uneven brightness. (a) Occlusion. Occlusion brings feature loss to the multi-view sampling process. Our method can adaptively learn similar features of nearby neighbors to reduce the impact of loss. (b) Uneven brightness. Uneven illumination caused by shadows or other factors causes different parts of the same structure of the same object to have different characteristics, which is not conducive to depth recovery. Our method can utilize adaptive supplementation of surrounding pixel features to address the negative impact of uneven illumination.
  • Figure 2: Comparison of SDL-MVS with existing methods. Compared with (a) previous methods, (b) our method introduces a progressive space deformable sampling strategy embedded between the feature extractor and the cost volume generation process and a deformable discretization strategy of depth hypothesis, to solve the problems of occlusion and uneven brightness.
  • Figure 3: The framework diagram of our view Space joint Depth deformable Learning paradigm (SDL-MVS). Our SDL-MVS consists of five processes, namely feature extraction, Progressive Space deformable Sampling (PSS), Depth Hypothesis deformable Discretization (DHD), cost volume generation, and depth prediction. The input data of the framework is the reference remote sensing image and source remote sensing images, and the output is the depth estimation map of the reference remote sensing image.
  • Figure 4: The schematic of the proposed Progressive Space deformable Sampling (PSS). The input of the module is the reference features and source features obtained after the multi-scale feature extractor. The reference features are updated through progressive space feature sampling, and new reference features are output after feature aggregation.
  • Figure 5: Comparison of different discretization methods in the depth range. The first line in the figure represents the depth range, the second line denotes the Uniform Discretization (UD) interval, the third line denotes the Spacing Increasing Discretization (SID) interval, and the fourth line denotes the Centered Linear Increasing Discretization (CLID) interval proposed in this paper.
  • ...and 5 more figures