Performance and Generalizability Impacts of Incorporating Location Encoders into Deep Learning for Dynamic PM2.5 Estimation
Morteza Karimzadeh, Zhongying Wang, James L. Crooks
TL;DR
This work systematically evaluates how different ways of integrating geolocation influence deep learning for dynamic, high-resolution PM$_{2.5}$ estimation over the continental United States. By comparing no geolocation, raw coordinates (with and without sinusoidal transforms), and pretrained location encodings (GeoCLIP, SatCLIP), the authors assess within-region interpolation and out-of-region transfer using random, spatial, and checkerboard splits. Key findings show that pretrained location embeddings improve both local accuracy and geographic generalization, while naive coordinate features can hinder OoR performance and introduce artifacts; fusion strategy (Hadamard) and encoder choice (GeoCLIP vs SatCLIP) also impact results. These insights have practical implications for scalable, equitable air-quality monitoring, especially in data-sparse regions, and suggest directions for improving location-encoder design and adaptability to dynamic environmental targets.
Abstract
Deep learning has shown strong performance in geospatial prediction tasks, but the role of geolocation information in improving accuracy and generalizability remains underexamined. Recent work has introduced location encoders that aim to represent spatial context in a transferable way, yet most evaluations have focused on static mapping tasks. Here, we study the effect of incorporating geolocation into deep learning for a dynamic and spatially heterogeneous application: estimating daily surface-level PM2.5 across the contiguous United States using satellite and ground-based observations. We compare three strategies for representing location: excluding geolocation, using raw latitude and longitude, and using pretrained location encoders. We evaluate each under within-region and out-of-region generalization settings. Results show that raw coordinates can improve performance within regions by supporting spatial interpolation, but can reduce generalizability across regions. In contrast, pretrained location encoders such as GeoCLIP improve both predictive accuracy and geographic transfer. However, we also observe spatial artifacts linked to encoder characteristics, and performance varies across encoder types (e.g., SatCLIP vs. GeoCLIP). This work provides the first systematic evaluation of location encoders in a dynamic environmental estimation context and offers guidance for incorporating geolocation into deep learning models for geospatial prediction.
