Table of Contents
Fetching ...

Spatial Distribution-Shift Aware Knowledge-Guided Machine Learning

Arun Sharma, Majid Farhadloo, Mingzhou Yang, Ruolei Zeng, Subhankar Ghosh, Shashi Shekhar

TL;DR

The paper addresses accurate quantification of land emissions ($R_a$, $R_h$) in agroecosystems amid spatially heterogeneous soils and climates. It introduces Spatial Distribution-Shift Aware Knowledge-Guided Machine Learning (SDSA-KGML), a region-aware KGML pipeline that uses an Auto Region Detector, a GRU-based predictor with attention, and knowledge-guided loss, pre-trained on process-based synthetic data and fine-tuned with sparse observations. In Midwest Illinois, Iowa, and Indiana, state-specific SDSA-KGML models outperform a global KGML-Ag baseline in terms of lower mean-squared error and higher local accuracy. This approach highlights the importance of region-level calibration for precise carbon flux estimation, with potential impacts on precision agriculture and climate mitigation policies. The framework also sets a pathway for scalable deployment across spatially variable agroecosystems through integration of diverse data sources.

Abstract

Given inputs of diverse soil characteristics and climate data gathered from various regions, we aimed to build a model to predict accurate land emissions. The problem is important since accurate quantification of the carbon cycle in agroecosystems is crucial for mitigating climate change and ensuring sustainable food production. Predicting accurate land emissions is challenging since calibrating the heterogeneous nature of soil properties, moisture, and environmental conditions is hard at decision-relevant scales. Traditional approaches do not adequately estimate land emissions due to location-independent parameters failing to leverage the spatial heterogeneity and also require large datasets. To overcome these limitations, we proposed Spatial Distribution-Shift Aware Knowledge-Guided Machine Learning (SDSA-KGML), which leverages location-dependent parameters that account for significant spatial heterogeneity in soil moisture from multiple sites within the same region. Experimental results demonstrate that SDSA-KGML models achieve higher local accuracy for the specified states in the Midwest Region.

Spatial Distribution-Shift Aware Knowledge-Guided Machine Learning

TL;DR

The paper addresses accurate quantification of land emissions (, ) in agroecosystems amid spatially heterogeneous soils and climates. It introduces Spatial Distribution-Shift Aware Knowledge-Guided Machine Learning (SDSA-KGML), a region-aware KGML pipeline that uses an Auto Region Detector, a GRU-based predictor with attention, and knowledge-guided loss, pre-trained on process-based synthetic data and fine-tuned with sparse observations. In Midwest Illinois, Iowa, and Indiana, state-specific SDSA-KGML models outperform a global KGML-Ag baseline in terms of lower mean-squared error and higher local accuracy. This approach highlights the importance of region-level calibration for precise carbon flux estimation, with potential impacts on precision agriculture and climate mitigation policies. The framework also sets a pathway for scalable deployment across spatially variable agroecosystems through integration of diverse data sources.

Abstract

Given inputs of diverse soil characteristics and climate data gathered from various regions, we aimed to build a model to predict accurate land emissions. The problem is important since accurate quantification of the carbon cycle in agroecosystems is crucial for mitigating climate change and ensuring sustainable food production. Predicting accurate land emissions is challenging since calibrating the heterogeneous nature of soil properties, moisture, and environmental conditions is hard at decision-relevant scales. Traditional approaches do not adequately estimate land emissions due to location-independent parameters failing to leverage the spatial heterogeneity and also require large datasets. To overcome these limitations, we proposed Spatial Distribution-Shift Aware Knowledge-Guided Machine Learning (SDSA-KGML), which leverages location-dependent parameters that account for significant spatial heterogeneity in soil moisture from multiple sites within the same region. Experimental results demonstrate that SDSA-KGML models achieve higher local accuracy for the specified states in the Midwest Region.

Paper Structure

This paper contains 4 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Problem Statement liu2024knowledge
  • Figure 2: Illustration of the SDSA-KGML framework.
  • Figure 3: MSE Loss across Illinois, Iowa, and Indiana