Common Practices and Taxonomy in Deep Multi-view Fusion for Remote Sensing Applications
Francisco Mena, Diego Arenas, Marlon Nuske, Andreas Dengel
TL;DR
The paper surveys deep multi-view fusion for remote sensing, aiming to unify terminology and distill common practices across supervised EO tasks. It covers where, how, and what to fuse, detailing input-, feature-, and decision-level fusion, merge functions, and architectural choices for per-view encoders, regularization, and auxiliary losses. Empirically, feature-level fusion often delivers strong performance, optical views dominate while SAR/LiDAR/DSM provide valuable complements, and additional views generally boost predictive accuracy, though results vary by task and data. The work also outlines open challenges, including missing-view robustness, uncertainty quantification, and explainability, calling for standardized benchmarks and clearer comparisons of fusion strategies to advance the field.
Abstract
The advances in remote sensing technologies have boosted applications for Earth observation. These technologies provide multiple observations or views with different levels of information. They might contain static or temporary views with different levels of resolution, in addition to having different types and amounts of noise due to sensor calibration or deterioration. A great variety of deep learning models have been applied to fuse the information from these multiple views, known as deep multi-view or multi-modal fusion learning. However, the approaches in the literature vary greatly since different terminology is used to refer to similar concepts or different illustrations are given to similar techniques. This article gathers works on multi-view fusion for Earth observation by focusing on the common practices and approaches used in the literature. We summarize and structure insights from several different publications concentrating on unifying points and ideas. In this manuscript, we provide a harmonized terminology while at the same time mentioning the various alternative terms that are used in literature. The topics covered by the works reviewed focus on supervised learning with the use of neural network models. We hope this review, with a long list of recent references, can support future research and lead to a unified advance in the area.
