Improving Satellite Imagery Masking using Multi-task and Transfer Learning

Rangel Daroya; Luisa Vieira Lucchese; Travis Simmons; Punwath Prum; Tamlin Pavelsky; John Gardner; Colin J. Gleason; Subhransu Maji

Improving Satellite Imagery Masking using Multi-task and Transfer Learning

Rangel Daroya, Luisa Vieira Lucchese, Travis Simmons, Punwath Prum, Tamlin Pavelsky, John Gardner, Colin J. Gleason, Subhransu Maji

TL;DR

This work tackles the challenge of masking satellite imagery for downstream SSC estimation by predicting multiple masks simultaneously from Harmonized Landsat-Sentinel data. It introduces a multi-task deep learning framework with a shared backbone and per-mask heads, trained with transfer learning from large pre-training datasets, and compares CNN and transformer architectures. The approach yields a 9% F1 gain on water masking, up to a 30× speedup in the SSC pipeline, and a 2.64 mg/L improvement in SSC accuracy, while reducing memory and storage demands. The results demonstrate that end-to-end, multi-task masking enables global-scale, efficient, and more accurate surface water analyses, with practical guidance on model choice and training strategy for operational deployment.

Abstract

Many remote sensing applications employ masking of pixels in satellite imagery for subsequent measurements. For example, estimating water quality variables, such as Suspended Sediment Concentration (SSC) requires isolating pixels depicting water bodies unaffected by clouds, their shadows, terrain shadows, and snow and ice formation. A significant bottleneck is the reliance on a variety of data products (e.g., satellite imagery, elevation maps), and a lack of precision in individual steps affecting estimation accuracy. We propose to improve both the accuracy and computational efficiency of masking by developing a system that predicts all required masks from Harmonized Landsat and Sentinel (HLS) imagery. Our model employs multi-tasking to share computation and enable higher accuracy across tasks. We experiment with recent advances in deep network architectures and show that masking models can benefit from these, especially when combined with pre-training on large satellite imagery datasets. We present a collection of models offering different speed/accuracy trade-offs for masking. MobileNet variants are the fastest, and perform competitively with larger architectures. Transformer-based architectures are the slowest, but benefit the most from pre-training on large satellite imagery datasets. Our models provide a 9% F1 score improvement compared to previous work on water pixel identification. When integrated with an SSC estimation system, our models result in a 30x speedup while reducing estimation error by 2.64 mg/L, allowing for global-scale analysis. We also evaluate our model on a recently proposed cloud and cloud shadow estimation benchmark, where we outperform the current state-of-the-art model by at least 6% in F1 score.

Improving Satellite Imagery Masking using Multi-task and Transfer Learning

TL;DR

Abstract

Improving Satellite Imagery Masking using Multi-task and Transfer Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)