Distribution Shifts at Scale: Out-of-distribution Detection in Earth Observation
Burak Ekim, Girmaw Abebe Tadesse, Caleb Robinson, Gilles Hacheme, Michael Schmitt, Rahul Dodhia, Juan M. Lavista Ferres
TL;DR
The paper tackles the challenge of distribution shifts in Earth Observation by proposing TARDIS, a post-hoc OOD detector that preserves in-distribution performance while operating without labeled OOD data. It generates surrogate ID/OOD labels for unseen data by clustering internal activations of a pre-trained model and training a lightweight binary classifier on these features. Across EuroSAT and xBD, TARDIS achieves near-upper-bound surrogate-labeling performance in most setups and matches top post-hoc methods, with strong scalability demonstrated in the Fields of the World deployment. This approach enables global, real-time diagnostics of model robustness in low-data regions, offering practical, interpretable insights into distribution shifts at scale.
Abstract
Training robust deep learning models is crucial in Earth Observation, where globally deployed models often face distribution shifts that degrade performance, especially in low-data regions. Out-of-distribution (OOD) detection addresses this by identifying inputs that deviate from in-distribution (ID) data. However, existing methods either assume access to OOD data or compromise primary task performance, limiting real-world use. We introduce TARDIS, a post-hoc OOD detection method designed for scalable geospatial deployment. Our core innovation lies in generating surrogate distribution labels by leveraging ID data within the feature space. TARDIS takes a pre-trained model, ID data, and data from an unknown distribution (WILD), separates WILD into surrogate ID and OOD labels based on internal activations, and trains a binary classifier to detect distribution shifts. We validate on EuroSAT and xBD across 17 setups covering covariate and semantic shifts, showing near-upper-bound surrogate labeling performance in 13 cases and matching the performance of top post-hoc activation- and scoring-based methods. Finally, deploying TARDIS on Fields of the World reveals actionable insights into pre-trained model behavior at scale. The code is available at \href{https://github.com/microsoft/geospatial-ood-detection}{https://github.com/microsoft/geospatial-ood-detection}
