Table of Contents
Fetching ...

Causal machine learning for sustainable agroecosystems

Vasileios Sitokonstantinou, Emiliano Díaz Salas Porras, Jordi Cerdà Bautista, Maria Piles, Ioannis Athanasiadis, Hannah Kerner, Giulia Martini, Lily-belle Sweet, Ilias Tsoumas, Jakob Zscheischler, Gustau Camps-Valls

TL;DR

The paper argues that sustainable agroecosystem decision-making requires causal understanding beyond traditional predictive ML. It proposes a causal ML framework that integrates ML with causal reasoning, enabling both answering causal questions (causal discovery and effect estimation) and improving predictions through causality-aware modeling. Eight applications across science, policy, farming, and predictive modeling illustrate methods such as Invariant Causal Prediction ($ICP$), estimates of $ATE$ and $CATE$, the $X$-learner, and Double Machine Learning ($DML$), including crop-growth model intercomparison and evaluation of digital agriculture tools. A practical workflow—define causal questions, curate data, state assumptions, select methods, and validate robustness—is discussed alongside data limitations and benchmark needs. Overall, the work aims to provide evidence-based decisions that enhance sustainability and food security by quantifying intervention impacts and improving model transferability under changing conditions.

Abstract

In a changing climate, sustainable agriculture is essential for food security and environmental health. However, it is challenging to understand the complex interactions among its biophysical, social, and economic components. Predictive machine learning (ML), with its capacity to learn from data, is leveraged in sustainable agriculture for applications like yield prediction and weather forecasting. Nevertheless, it cannot explain causal mechanisms and remains descriptive rather than prescriptive. To address this gap, we propose causal ML, which merges ML's data processing with causality's ability to reason about change. This facilitates quantifying intervention impacts for evidence-based decision-making and enhances predictive model robustness. We showcase causal ML through eight diverse applications that benefit stakeholders across the agri-food chain, including farmers, policymakers, and researchers.

Causal machine learning for sustainable agroecosystems

TL;DR

The paper argues that sustainable agroecosystem decision-making requires causal understanding beyond traditional predictive ML. It proposes a causal ML framework that integrates ML with causal reasoning, enabling both answering causal questions (causal discovery and effect estimation) and improving predictions through causality-aware modeling. Eight applications across science, policy, farming, and predictive modeling illustrate methods such as Invariant Causal Prediction (), estimates of and , the -learner, and Double Machine Learning (), including crop-growth model intercomparison and evaluation of digital agriculture tools. A practical workflow—define causal questions, curate data, state assumptions, select methods, and validate robustness—is discussed alongside data limitations and benchmark needs. Overall, the work aims to provide evidence-based decisions that enhance sustainability and food security by quantifying intervention impacts and improving model transferability under changing conditions.

Abstract

In a changing climate, sustainable agriculture is essential for food security and environmental health. However, it is challenging to understand the complex interactions among its biophysical, social, and economic components. Predictive machine learning (ML), with its capacity to learn from data, is leveraged in sustainable agriculture for applications like yield prediction and weather forecasting. Nevertheless, it cannot explain causal mechanisms and remains descriptive rather than prescriptive. To address this gap, we propose causal ML, which merges ML's data processing with causality's ability to reason about change. This facilitates quantifying intervention impacts for evidence-based decision-making and enhances predictive model robustness. We showcase causal ML through eight diverse applications that benefit stakeholders across the agri-food chain, including farmers, policymakers, and researchers.
Paper Structure (15 sections, 2 figures)

This paper contains 15 sections, 2 figures.

Figures (2)

  • Figure 1: How causality can drive sustainability in agriculture: a) Solve common ML problems: Address robustness to interventions over time (e.g., new policies) and geographic generalization to develop predictive models that can continuously and globally predict agricultural activity and ecological conditions. For instance, a standard model might fail to recognize wheat in snowy conditions if it was trained only on sunny conditions. A causality-aware model, however, learns the stable underlying features of wheat, allowing accurate identification in any weather. b) Answer causal questions: Use large-scale, robust predictions from (a) and/or other agricultural observations with causal discovery methods to produce causal graphs that map cause-effect relationships. Based on these graphs, estimate the effects of treatments (averaged or individualized) on outcomes of interest, as the relevant covariates for isolating non-causal associations (red line) are now known. With this knowledge, the most effective solutions can be identified and prioritized.
  • Figure 2: Applications of causal ML for agriculture. Panel a) Causal discovery applications: Data-driven causal discovery (1) unveils causal mechanisms in complex systems like food security, enhancing domain expertise, and (2) evaluates process-based (PBs) models by comparing causal graphs from model simulations and observational data. Panel b) Causal effect estimation applications: Support evidence-based decisions by evaluating the impact of (3) human actions and (4) climate/weather events on sustainability outcomes. (5) Sustainable practices can be spatially tailored by estimating each land unit's individualized impact based on their characteristics. Which factors are responsible for the differences in impact? Panel c) Causality-aware ML applications: To achieve geographic generalization (7) and robustness to interventions (8) in predictive models, it is key to balance errors across various environments (e.g., geographic areas, anomalous weather events - variable C). This helps identify causal features, like X1, which maintain a stable relationship with the outcome, X2, under different conditions.