Planted: a dataset for planted forest identification from multi-satellite time series
Luis Miguel Pazos-Outón, Cristina Nader Vasconcelos, Anton Raichuk, Anurag Arnab, Dan Morris, Maxim Neumann
TL;DR
The paper addresses global monitoring of planted forests and tree crops using multi-satellite time series. It introduces Planted, a large, curated multimodal dataset spanning five satellites with 2,264,747 examples across 64 classes and 41 countries, along with labels and metadata. The authors establish baseline single-modality and multimodal transformer baselines, analyze data augmentation and fusion strategies, and show mid-fusion across 2-3 modalities yields strong performance while highlighting challenges like missing data and label imbalance. The dataset provides a benchmark to spur research in multimodal, time-series remote sensing for forest monitoring with potential impact on conservation and carbon accounting.
Abstract
Protecting and restoring forest ecosystems is critical for biodiversity conservation and carbon sequestration. Forest monitoring on a global scale is essential for prioritizing and assessing conservation efforts. Satellite-based remote sensing is the only viable solution for providing global coverage, but to date, large-scale forest monitoring is limited to single modalities and single time points. In this paper, we present a dataset consisting of data from five public satellites for recognizing forest plantations and planted tree species across the globe. Each satellite modality consists of a multi-year time series. The dataset, named \PlantD, includes over 2M examples of 64 tree label classes (46 genera and 40 species), distributed among 41 countries. This dataset is released to foster research in forest monitoring using multimodal, multi-scale, multi-temporal data sources. Additionally, we present initial baseline results and evaluate modality fusion and data augmentation approaches for this dataset.
