LADI v2: Multi-label Dataset and Classifiers for Low-Altitude Disaster Imagery

Samuel Scheele; Katherine Picchione; Jeffrey Liu

LADI v2: Multi-label Dataset and Classifiers for Low-Altitude Disaster Imagery

Samuel Scheele, Katherine Picchione, Jeffrey Liu

TL;DR

This work presents LADI v2, a roughly 10k-image, multi-label dataset of low-altitude disaster imagery annotated by CAP volunteers using FEMA PDA criteria to support rapid post-disaster assessment. It provides two pretrained baselines (BiT-50 small and Swin v2 large) and benchmarks against open-vocabulary vision-language models (LLaVA-NeXT, GPT-4o), highlighting strong in-domain performance and the value of domain-specific data. The dataset exhibits realistic distribution shifts across years and technology adoption (e.g., WaldoAir), making it a meaningful benchmark for domain adaptation in disaster response. The authors release the dataset and code openly, demonstrating the practical impact of curated, domain-relevant data for accelerating emergency management research and deployment.

Abstract

ML-based computer vision models are promising tools for supporting emergency management operations following natural disasters. Arial photographs taken from small manned and unmanned aircraft can be available soon after a disaster and provide valuable information from multiple perspectives for situational awareness and damage assessment applications. However, emergency managers often face challenges finding the most relevant photos among the tens of thousands that may be taken after an incident. While ML-based solutions could enable more effective use of aerial photographs, there is still a lack of training data for imagery of this type from multiple perspectives and for multiple hazard types. To address this, we present the LADI v2 (Low Altitude Disaster Imagery version 2) dataset, a curated set of about 10,000 disaster images captured in the United States by the Civil Air Patrol (CAP) in response to federally-declared emergencies (2015-2023) and annotated for multi-label classification by trained CAP volunteers. We also provide two pretrained baseline classifiers and compare their performance to state-of-the-art vision-language models in multi-label classification. The data and code are released publicly to support the development of computer vision models for emergency management research and applications.

LADI v2: Multi-label Dataset and Classifiers for Low-Altitude Disaster Imagery

TL;DR

Abstract

Paper Structure (13 sections, 5 figures, 3 tables)

This paper contains 13 sections, 5 figures, 3 tables.

Introduction
Related Work
Natural Disaster Imagery Datasets
Vision-Language Models
LADI v2 Dataset
Data collection and annotation
Dataset statistics and characteristics
Pretrained Classifiers
Architecture and training details
Performance Analysis
Semantic Similarity Analysis
Comparison with open-vocabulary classification (LLaVA and GPT-4o)
Conclusion

Figures (5)

Figure 1: A sample of images from our training set. LADI v2 contains images with both positive and negative examples of damage from a range of altitudes, perspectives, geographies, and lighting conditions
Figure 2: Details of the dataset: the label sets, number of training examples by state, and event type distribution of the various splits.
Figure 3: Co-occurrence matrices for data splits. Numbers indicate percentage of images within the given split that have the given combination of labels. Lighter colors indicate higher percentages. Note that training and validation sets are combined in these figures for brevity due to their similar distributions.
Figure 4: Characterization of classifier performance by event type and location.
Figure 5: Error vector $L^1$ norm vs. the distance from a point in an evaluation set to its nearest neighbor in the train set in CLIP space. Validation data is plotted in blue and test data in orange.

LADI v2: Multi-label Dataset and Classifiers for Low-Altitude Disaster Imagery

TL;DR

Abstract

LADI v2: Multi-label Dataset and Classifiers for Low-Altitude Disaster Imagery

Authors

TL;DR

Abstract

Table of Contents

Figures (5)