Table of Contents
Fetching ...

Uncertainty-Aware Regression for Socio-Economic Estimation via Multi-View Remote Sensing

Fan Yang, Sahoko Ishida, Mengyan Zhang, Daniel Jenson, Swapnil Mishra, Jhonathan Navott, Seth Flaxman

TL;DR

A novel framework is introduced that leverages generic foundational vision models to process remote sensing imagery using combinations of three spectral bands to exploit multi-spectral data and employs methods such as heteroscedastic regression and Bayesian modeling to generate uncertainty estimates for the predictions.

Abstract

Remote sensing imagery offers rich spectral data across extensive areas for Earth observation. Many attempts have been made to leverage these data with transfer learning to develop scalable alternatives for estimating socio-economic conditions, reducing reliance on expensive survey-collected data. However, much of this research has primarily focused on daytime satellite imagery due to the limitation that most pre-trained models are trained on 3-band RGB images. Consequently, modeling techniques for spectral bands beyond the visible spectrum have not been thoroughly investigated. Additionally, quantifying uncertainty in remote sensing regression has been less explored, yet it is essential for more informed targeting and iterative collection of ground truth survey data. In this paper, we introduce a novel framework that leverages generic foundational vision models to process remote sensing imagery using combinations of three spectral bands to exploit multi-spectral data. We also employ methods such as heteroscedastic regression and Bayesian modeling to generate uncertainty estimates for the predictions. Experimental results demonstrate that our method outperforms existing models that use RGB or multi-spectral models with unstructured band usage. Moreover, our framework helps identify uncertain predictions, guiding future ground truth data acquisition.

Uncertainty-Aware Regression for Socio-Economic Estimation via Multi-View Remote Sensing

TL;DR

A novel framework is introduced that leverages generic foundational vision models to process remote sensing imagery using combinations of three spectral bands to exploit multi-spectral data and employs methods such as heteroscedastic regression and Bayesian modeling to generate uncertainty estimates for the predictions.

Abstract

Remote sensing imagery offers rich spectral data across extensive areas for Earth observation. Many attempts have been made to leverage these data with transfer learning to develop scalable alternatives for estimating socio-economic conditions, reducing reliance on expensive survey-collected data. However, much of this research has primarily focused on daytime satellite imagery due to the limitation that most pre-trained models are trained on 3-band RGB images. Consequently, modeling techniques for spectral bands beyond the visible spectrum have not been thoroughly investigated. Additionally, quantifying uncertainty in remote sensing regression has been less explored, yet it is essential for more informed targeting and iterative collection of ground truth survey data. In this paper, we introduce a novel framework that leverages generic foundational vision models to process remote sensing imagery using combinations of three spectral bands to exploit multi-spectral data. We also employ methods such as heteroscedastic regression and Bayesian modeling to generate uncertainty estimates for the predictions. Experimental results demonstrate that our method outperforms existing models that use RGB or multi-spectral models with unstructured band usage. Moreover, our framework helps identify uncertain predictions, guiding future ground truth data acquisition.

Paper Structure

This paper contains 36 sections, 16 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: We select distinct satellite views of the same location across different band groups, capturing unique spatial features. These views are then processed through separate pre-trained vision models to extract satellite features, which are subsequently aggregated to predict the target variable with associated uncertainty estimates.
  • Figure 2: Examples of different views near Murchison Bay, Uganda. The natural color view uses bands B4 (Red), B3 (Green), and B2 (Blue). The false-color view incorporates bands B8 (Near Infrared), B4 (Red), and B2 (Blue). The land moisture view is composed of bands B12 (Short-Wave Infrared), B1 (Coastal Blue), and B3 (Green). Lastly, the agriculture view includes bands B11 (Short-Wave Infrared), B8 (Near Infrared), and B2 (Red).
  • Figure 3: Validation errors for predicting severe deprivation using various model configurations: a raw SWIN model accepting RGB imagery, a pre-trained SWIN model and a pre-trained DINOv2-ViT-Base model with a CNN head that maps 13-channel imagery to 3 channels, and a pre-trained DINOv2-ViT-Base model accepting RGB imagery. Performance is tracked by negative mean absolute error over the fine-tuning process.
  • Figure 4: Example Predictions for Rwanda in 2019. Figure (a): Ground truth labels derived from the DHS dataset, visualized using Ordinary Kriging. Figure (b): Posterior mean estimates obtained using DINOv2-ViT-Base with a multi-view fine-tuning scheme and Bayesian linear regression. Figure (c): Posterior variance from the Bayesian linear regression, with blue markers indicating locations where training data exist.