High-Resolution Building and Road Detection from Sentinel-2
Wojciech Sirko, Emmanuel Asiedu Brempong, Juliana T. C. Marcos, Abigail Annkah, Abel Korme, Mohammed Alewi Hassen, Krishna Sapkota, Tomer Shekel, Abdoulaye Diack, Sella Nevo, Jason Hickey, John Quinn
TL;DR
The paper demonstrates that a teacher–student framework can leverage freely available Sentinel-2 imagery to reconstruct high-resolution building and road presence at 50 cm, achieving a building $mIoU$ of $79.0\%$ versus a high-resolution teacher at $85.5\%$, by training a multi-task end-to-end model on a large-scale, globally distributed dataset. The approach uses a 32-frame Sentinel-2 stack, an HRNet-based encoder with cross-time fusion, and a decoder that upscales to the target resolution, while also enabling building centroid counting and height prediction. Key findings include strong cross-region generalization, the utility of incidence-angle metadata for label alignment, and clear advantages of temporal fusion and higher label/input resolutions; the method broadens access to fine-grained mapping by exploiting openly available data. These results offer practical impact for large-scale urban analytics, disaster response, and policy planning where high-resolution imagery is unavailable or costly.
Abstract
Mapping buildings and roads automatically with remote sensing typically requires high-resolution imagery, which is expensive to obtain and often sparsely available. In this work we demonstrate how multiple 10 m resolution Sentinel-2 images can be used to generate 50 cm resolution building and road segmentation masks. This is done by training a `student' model with access to Sentinel-2 images to reproduce the predictions of a `teacher' model which has access to corresponding high-resolution imagery. While the predictions do not have all the fine detail of the teacher model, we find that we are able to retain much of the performance: for building segmentation we achieve 79.0\% mIoU, compared to the high-resolution teacher model accuracy of 85.5\% mIoU. We also describe two related methods that work on Sentinel-2 imagery: one for counting individual buildings which achieves $R^2 = 0.91$ against true counts and one for predicting building height with 1.5 meter mean absolute error. This work opens up new possibilities for using freely available Sentinel-2 imagery for a range of tasks that previously could only be done with high-resolution satellite imagery.
