Table of Contents
Fetching ...

Scientifically-Interpretable Reasoning Network (ScIReN): Discovering Hidden Relationships in the Carbon Cycle and Beyond

Joshua Fan, Haodi Xu, Feng Tao, Md Nasim, Marc Grimson, Yiqi Luo, Carla P. Gomes

TL;DR

The paper tackles the interpretability challenge in soil carbon cycle modeling by introducing ScIReN, a framework that couples an interpretable encoder based on Kolmogorov-Arnold networks with a differentiable, process-based decoder. Latent biogeochemical parameters $\mathbf{p}$ are predicted from environmental features $\mathbf{x}$, constrained by a hard-sigmoid layer to physically plausible ranges, and then transformed through domain knowledge into outputs via $g_{PBM}$. Regularization via entropy (sparsity) and spline-smoothness, along with a parameter-violation term, yields interpretable, robust mappings from inputs to latent parameters. On ecosystem respiration and CLM5-based soil carbon tasks, ScIReN achieves predictive accuracy on par with black-box models while exposing latent relationships that align with ecological theory and enable extrapolation. This framework offers a general path toward transparent scientific discovery in earth system modeling and beyond.

Abstract

Soils have potential to mitigate climate change by sequestering carbon from the atmosphere, but the soil carbon cycle remains poorly understood. Scientists have developed process-based models of the soil carbon cycle based on existing knowledge, but they contain numerous unknown parameters and often fit observations poorly. On the other hand, neural networks can learn patterns from data, but do not respect known scientific laws, and are too opaque to reveal novel scientific relationships. We thus propose Scientifically-Interpretable Reasoning Network (ScIReN), a fully-transparent framework that combines interpretable neural and process-based reasoning. An interpretable encoder predicts scientifically-meaningful latent parameters, which are then passed through a differentiable process-based decoder to predict labeled output variables. While the process-based decoder enforces existing scientific knowledge, the encoder leverages Kolmogorov-Arnold networks (KANs) to reveal interpretable relationships between input features and latent parameters, using novel smoothness penalties to balance expressivity and simplicity. ScIReN also introduces a novel hard-sigmoid constraint layer to restrict latent parameters into prior ranges while maintaining interpretability. We apply ScIReN on two tasks: simulating the flow of organic carbon through soils, and modeling ecosystem respiration from plants. On both tasks, ScIReN outperforms or matches black-box models in predictive accuracy, while greatly improving scientific interpretability -- it can infer latent scientific mechanisms and their relationships with input features.

Scientifically-Interpretable Reasoning Network (ScIReN): Discovering Hidden Relationships in the Carbon Cycle and Beyond

TL;DR

The paper tackles the interpretability challenge in soil carbon cycle modeling by introducing ScIReN, a framework that couples an interpretable encoder based on Kolmogorov-Arnold networks with a differentiable, process-based decoder. Latent biogeochemical parameters are predicted from environmental features , constrained by a hard-sigmoid layer to physically plausible ranges, and then transformed through domain knowledge into outputs via . Regularization via entropy (sparsity) and spline-smoothness, along with a parameter-violation term, yields interpretable, robust mappings from inputs to latent parameters. On ecosystem respiration and CLM5-based soil carbon tasks, ScIReN achieves predictive accuracy on par with black-box models while exposing latent relationships that align with ecological theory and enable extrapolation. This framework offers a general path toward transparent scientific discovery in earth system modeling and beyond.

Abstract

Soils have potential to mitigate climate change by sequestering carbon from the atmosphere, but the soil carbon cycle remains poorly understood. Scientists have developed process-based models of the soil carbon cycle based on existing knowledge, but they contain numerous unknown parameters and often fit observations poorly. On the other hand, neural networks can learn patterns from data, but do not respect known scientific laws, and are too opaque to reveal novel scientific relationships. We thus propose Scientifically-Interpretable Reasoning Network (ScIReN), a fully-transparent framework that combines interpretable neural and process-based reasoning. An interpretable encoder predicts scientifically-meaningful latent parameters, which are then passed through a differentiable process-based decoder to predict labeled output variables. While the process-based decoder enforces existing scientific knowledge, the encoder leverages Kolmogorov-Arnold networks (KANs) to reveal interpretable relationships between input features and latent parameters, using novel smoothness penalties to balance expressivity and simplicity. ScIReN also introduces a novel hard-sigmoid constraint layer to restrict latent parameters into prior ranges while maintaining interpretability. We apply ScIReN on two tasks: simulating the flow of organic carbon through soils, and modeling ecosystem respiration from plants. On both tasks, ScIReN outperforms or matches black-box models in predictive accuracy, while greatly improving scientific interpretability -- it can infer latent scientific mechanisms and their relationships with input features.

Paper Structure

This paper contains 29 sections, 28 equations, 11 figures, 9 tables.

Figures (11)

  • Figure 1: Overview of ScIReN. The encoder reveals interpretable functional relationships between environmental inputs (e.g. temperature) and latent scientific parameters (e.g. transfer rates between soil pools). A constraint layer forces latent parameters into a prior range, and the process-based decoder simulates the physical process with the given latent parameters.
  • Figure 2: Learned encoder examples: 1-layer KAN (left) and 2-layer KAN (right)
  • Figure 3: Left: The hard-sigmoid function constrains parameters to $[p_{min}, p_{max}]$, without adding nonlinearity. Right: parameter violation loss pushes the hard-sigmoid input away from flat regions.
  • Figure 4: Functional relationships learned by Blackbox-Hybrid (left) and ScIReN (center) vs. truth (right), on synthetic labels. ScIReN recovers the true relationships much more accurately.
  • Figure 5: Pure NN (Table 1, linear $R_b$). Because the model does not have access to scientific knowledge (process-based model), it struggles to extrapolate on the higher end of ecosystem respiration / temperature (left). It is also unable to predict the latent variable (right).
  • ...and 6 more figures