Nacala-Roof-Material: Drone Imagery for Roof Detection, Classification, and Segmentation to Support Mosquito-borne Disease Risk Assessment

Venkanna Babu Guthula; Stefan Oehmcke; Remigio Chilaule; Hui Zhang; Nico Lang; Ankit Kariryaa; Johan Mottelson; Christian Igel

Nacala-Roof-Material: Drone Imagery for Roof Detection, Classification, and Segmentation to Support Mosquito-borne Disease Risk Assessment

Venkanna Babu Guthula, Stefan Oehmcke, Remigio Chilaule, Hui Zhang, Nico Lang, Ankit Kariryaa, Johan Mottelson, Christian Igel

TL;DR

The paper tackles malaria risk assessment by enabling automatic roof-type mapping from high-resolution drone imagery. It introduces the Nacala-Roof-Material dataset with 17,954 buildings in Mozambique and defines three tasks: building detection, roof-type classification, and pixel-level segmentation. A Deep Ordinal Watershed (DOW) extension is proposed to improve object separation by predicting interior object masks alongside standard segmentation, enabling better instance delineation across baselines like U-Net, YOLOv8, and DINOv2. Experimental results show that DOW variants generally enhance object separation and segmentation performance, with no single method dominating all metrics; the dataset and code are publicly available to spur multi-task learning for risk-informed interventions. This work provides a practical resource for remote-sensing-based malaria risk mapping and demonstrates a scalable approach to joint semantic and instance segmentation in high-resolution urban-rural settings.

Abstract

As low-quality housing and in particular certain roof characteristics are associated with an increased risk of malaria, classification of roof types based on remote sensing imagery can support the assessment of malaria risk and thereby help prevent the disease. To support research in this area, we release the Nacala-Roof-Material dataset, which contains high-resolution drone images from Mozambique with corresponding labels delineating houses and specifying their roof types. The dataset defines a multi-task computer vision problem, comprising object detection, classification, and segmentation. In addition, we benchmarked various state-of-the-art approaches on the dataset. Canonical U-Nets, YOLOv8, and a custom decoder on pretrained DINOv2 served as baselines. We show that each of the methods has its advantages but none is superior on all tasks, which highlights the potential of our dataset for future research in multi-task learning. While the tasks are closely related, accurate segmentation of objects does not necessarily imply accurate instance separation, and vice versa. We address this general issue by introducing a variant of the deep ordinal watershed (DOW) approach that additionally separates the interior of objects, allowing for improved object delineation and separation. We show that our DOW variant is a generic approach that improves the performance of both U-Net and DINOv2 backbones, leading to a better trade-off between semantic segmentation and instance segmentation.

Nacala-Roof-Material: Drone Imagery for Roof Detection, Classification, and Segmentation to Support Mosquito-borne Disease Risk Assessment

TL;DR

Abstract

Paper Structure (26 sections, 1 equation, 7 figures, 6 tables)

This paper contains 26 sections, 1 equation, 7 figures, 6 tables.

Introduction
Nacala-Roof-Material Data
Background: Housing conditions and risk of mosquito-borne diseases.
The Nacala-Roof-Material dataset.
Related datasets.
Benchmarked Methods
Baseline Models
U-Net.
YOLOv8.
DINOv2.
Deep Ordinal Watershed
Two-stage vs. End-to-end
Experiments and Results
Experimental Setup
Evaluation Metrics
...and 11 more sections

Figures (7)

Figure 1: (a) Visualisation of the training, validation and test sets with reference to longitude and latitude; (b) Drone imagery with labels; (c) Instance counts for each class in all sets.
Figure 2: Baseline (top) and DOW (bottom) variants of our systems using either ResNet35 (in the case of the U-Net architectures) or DINOv2 as encoders. When using DOW, The watershed algorithm takes two segmentation masks as input, the predicted objects (level 1) and their interiors (level 2). In the two-stage approach, the classifier shown in Figure \ref{['dinovc']} is using the binary building segmentation (left). In the end-to-end setting, the roof material is predicted directly with a multi-class segmentation approach (right).
Figure 3: The architecture of the DINOv2 based roof material classifier used in the two-stage setting. A classifier (e.g., logistic regression) is applied to the resulting feature vector.
Figure 4: Exemplary predictions on $\mathcal{D}_{\text{test}}$ by different models. The predictions are polygonized and colored by class. The roof types with few training examples, asbestos and concrete, are particularly difficult, see bottom row.
Figure A.5: Basic U-Net architecture
...and 2 more figures

Nacala-Roof-Material: Drone Imagery for Roof Detection, Classification, and Segmentation to Support Mosquito-borne Disease Risk Assessment

TL;DR

Abstract

Nacala-Roof-Material: Drone Imagery for Roof Detection, Classification, and Segmentation to Support Mosquito-borne Disease Risk Assessment

Authors

TL;DR

Abstract

Table of Contents

Figures (7)