Spatial Distribution-Shift Aware Knowledge-Guided Machine Learning
Arun Sharma, Majid Farhadloo, Mingzhou Yang, Ruolei Zeng, Subhankar Ghosh, Shashi Shekhar
TL;DR
The paper addresses accurate quantification of land emissions ($R_a$, $R_h$) in agroecosystems amid spatially heterogeneous soils and climates. It introduces Spatial Distribution-Shift Aware Knowledge-Guided Machine Learning (SDSA-KGML), a region-aware KGML pipeline that uses an Auto Region Detector, a GRU-based predictor with attention, and knowledge-guided loss, pre-trained on process-based synthetic data and fine-tuned with sparse observations. In Midwest Illinois, Iowa, and Indiana, state-specific SDSA-KGML models outperform a global KGML-Ag baseline in terms of lower mean-squared error and higher local accuracy. This approach highlights the importance of region-level calibration for precise carbon flux estimation, with potential impacts on precision agriculture and climate mitigation policies. The framework also sets a pathway for scalable deployment across spatially variable agroecosystems through integration of diverse data sources.
Abstract
Given inputs of diverse soil characteristics and climate data gathered from various regions, we aimed to build a model to predict accurate land emissions. The problem is important since accurate quantification of the carbon cycle in agroecosystems is crucial for mitigating climate change and ensuring sustainable food production. Predicting accurate land emissions is challenging since calibrating the heterogeneous nature of soil properties, moisture, and environmental conditions is hard at decision-relevant scales. Traditional approaches do not adequately estimate land emissions due to location-independent parameters failing to leverage the spatial heterogeneity and also require large datasets. To overcome these limitations, we proposed Spatial Distribution-Shift Aware Knowledge-Guided Machine Learning (SDSA-KGML), which leverages location-dependent parameters that account for significant spatial heterogeneity in soil moisture from multiple sites within the same region. Experimental results demonstrate that SDSA-KGML models achieve higher local accuracy for the specified states in the Midwest Region.
