Nacala-Roof-Material: Drone Imagery for Roof Detection, Classification, and Segmentation to Support Mosquito-borne Disease Risk Assessment
Venkanna Babu Guthula, Stefan Oehmcke, Remigio Chilaule, Hui Zhang, Nico Lang, Ankit Kariryaa, Johan Mottelson, Christian Igel
TL;DR
The paper tackles malaria risk assessment by enabling automatic roof-type mapping from high-resolution drone imagery. It introduces the Nacala-Roof-Material dataset with 17,954 buildings in Mozambique and defines three tasks: building detection, roof-type classification, and pixel-level segmentation. A Deep Ordinal Watershed (DOW) extension is proposed to improve object separation by predicting interior object masks alongside standard segmentation, enabling better instance delineation across baselines like U-Net, YOLOv8, and DINOv2. Experimental results show that DOW variants generally enhance object separation and segmentation performance, with no single method dominating all metrics; the dataset and code are publicly available to spur multi-task learning for risk-informed interventions. This work provides a practical resource for remote-sensing-based malaria risk mapping and demonstrates a scalable approach to joint semantic and instance segmentation in high-resolution urban-rural settings.
Abstract
As low-quality housing and in particular certain roof characteristics are associated with an increased risk of malaria, classification of roof types based on remote sensing imagery can support the assessment of malaria risk and thereby help prevent the disease. To support research in this area, we release the Nacala-Roof-Material dataset, which contains high-resolution drone images from Mozambique with corresponding labels delineating houses and specifying their roof types. The dataset defines a multi-task computer vision problem, comprising object detection, classification, and segmentation. In addition, we benchmarked various state-of-the-art approaches on the dataset. Canonical U-Nets, YOLOv8, and a custom decoder on pretrained DINOv2 served as baselines. We show that each of the methods has its advantages but none is superior on all tasks, which highlights the potential of our dataset for future research in multi-task learning. While the tasks are closely related, accurate segmentation of objects does not necessarily imply accurate instance separation, and vice versa. We address this general issue by introducing a variant of the deep ordinal watershed (DOW) approach that additionally separates the interior of objects, allowing for improved object delineation and separation. We show that our DOW variant is a generic approach that improves the performance of both U-Net and DINOv2 backbones, leading to a better trade-off between semantic segmentation and instance segmentation.
