Table of Contents
Fetching ...

SmartWilds: Multimodal Wildlife Monitoring Dataset

Jenna Kline, Anirudh Potlapally, Bharath Pillai, Tanishka Wani, Rugved Katole, Vedant Patil, Penelope Covey, Hari Subramoni, Tanya Berger-Wolf, Christopher Stewart

TL;DR

SmartWilds presents a synchronized multimodal wildlife monitoring dataset comprising drone imagery, camera-trap data, and bioacoustic recordings collected at The Wilds, Ohio. It establishes reproducible field protocols and provides baseline cross-modality analyses to benchmark sensor fusion for conservation tasks, with an initial release of about 101 GB across three modalities and rich metadata. The work demonstrates complementary strengths among modalities—cameras for species ID, acoustics for continuous temporal coverage, and drones for landscape-scale context—and outlines future integration of GPS-tracked individuals and citizen science data to enable environmental digital twins. These contributions advance conservation AI by enabling robust multimodal perception and planning for endangered species monitoring and habitat management.

Abstract

We present the first release of SmartWilds, a multimodal wildlife monitoring dataset. SmartWilds is a synchronized collection of drone imagery, camera trap photographs and videos, and bioacoustic recordings collected during summer 2025 at The Wilds safari park in Ohio. This dataset supports multimodal AI research for comprehensive environmental monitoring, addressing critical needs in endangered species research, conservation ecology, and habitat management. Our pilot deployment captured four days of synchronized monitoring across three modalities in a 220-acre pasture containing Pere David's deer, Sichuan takin, Przewalski's horses, as well as species native to Ohio. We provide a comparative analysis of sensor modality performance, demonstrating complementary strengths for landuse patterns, species detection, behavioral analysis, and habitat monitoring. This work establishes reproducible protocols for multimodal wildlife monitoring while contributing open datasets to advance conservation computer vision research. Future releases will include synchronized GPS tracking data from tagged individuals, citizen science data, and expanded temporal coverage across multiple seasons.

SmartWilds: Multimodal Wildlife Monitoring Dataset

TL;DR

SmartWilds presents a synchronized multimodal wildlife monitoring dataset comprising drone imagery, camera-trap data, and bioacoustic recordings collected at The Wilds, Ohio. It establishes reproducible field protocols and provides baseline cross-modality analyses to benchmark sensor fusion for conservation tasks, with an initial release of about 101 GB across three modalities and rich metadata. The work demonstrates complementary strengths among modalities—cameras for species ID, acoustics for continuous temporal coverage, and drones for landscape-scale context—and outlines future integration of GPS-tracked individuals and citizen science data to enable environmental digital twins. These contributions advance conservation AI by enabling robust multimodal perception and planning for endangered species monitoring and habitat management.

Abstract

We present the first release of SmartWilds, a multimodal wildlife monitoring dataset. SmartWilds is a synchronized collection of drone imagery, camera trap photographs and videos, and bioacoustic recordings collected during summer 2025 at The Wilds safari park in Ohio. This dataset supports multimodal AI research for comprehensive environmental monitoring, addressing critical needs in endangered species research, conservation ecology, and habitat management. Our pilot deployment captured four days of synchronized monitoring across three modalities in a 220-acre pasture containing Pere David's deer, Sichuan takin, Przewalski's horses, as well as species native to Ohio. We provide a comparative analysis of sensor modality performance, demonstrating complementary strengths for landuse patterns, species detection, behavioral analysis, and habitat monitoring. This work establishes reproducible protocols for multimodal wildlife monitoring while contributing open datasets to advance conservation computer vision research. Future releases will include synchronized GPS tracking data from tagged individuals, citizen science data, and expanded temporal coverage across multiple seasons.

Paper Structure

This paper contains 16 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Representative images and data study design. GPS and time-stamp metadata allow for cross-referencing between modalities. (a) Diagram of dataset modalities, citizen science images, GPS tags, acoustic data, camera trap and drone images, joined via location and time-stamp metadata. (b) Example of multi-modal data cross-referencing using metadata. Camera trap view (TW02) of the Pere David's deer synchronized with the drone image of the Pere David's deer wading in the lake.
  • Figure 2: Map of sensor placements created with Google Earth. Camera trap locations in orange, bioacoustics sensors in blue, drone flight paths in red. See Table \ref{['tab:site_selection']} for site details.