Scientifically-Interpretable Reasoning Network (ScIReN): Discovering Hidden Relationships in the Carbon Cycle and Beyond
Joshua Fan, Haodi Xu, Feng Tao, Md Nasim, Marc Grimson, Yiqi Luo, Carla P. Gomes
TL;DR
The paper tackles the interpretability challenge in soil carbon cycle modeling by introducing ScIReN, a framework that couples an interpretable encoder based on Kolmogorov-Arnold networks with a differentiable, process-based decoder. Latent biogeochemical parameters $\mathbf{p}$ are predicted from environmental features $\mathbf{x}$, constrained by a hard-sigmoid layer to physically plausible ranges, and then transformed through domain knowledge into outputs via $g_{PBM}$. Regularization via entropy (sparsity) and spline-smoothness, along with a parameter-violation term, yields interpretable, robust mappings from inputs to latent parameters. On ecosystem respiration and CLM5-based soil carbon tasks, ScIReN achieves predictive accuracy on par with black-box models while exposing latent relationships that align with ecological theory and enable extrapolation. This framework offers a general path toward transparent scientific discovery in earth system modeling and beyond.
Abstract
Soils have potential to mitigate climate change by sequestering carbon from the atmosphere, but the soil carbon cycle remains poorly understood. Scientists have developed process-based models of the soil carbon cycle based on existing knowledge, but they contain numerous unknown parameters and often fit observations poorly. On the other hand, neural networks can learn patterns from data, but do not respect known scientific laws, and are too opaque to reveal novel scientific relationships. We thus propose Scientifically-Interpretable Reasoning Network (ScIReN), a fully-transparent framework that combines interpretable neural and process-based reasoning. An interpretable encoder predicts scientifically-meaningful latent parameters, which are then passed through a differentiable process-based decoder to predict labeled output variables. While the process-based decoder enforces existing scientific knowledge, the encoder leverages Kolmogorov-Arnold networks (KANs) to reveal interpretable relationships between input features and latent parameters, using novel smoothness penalties to balance expressivity and simplicity. ScIReN also introduces a novel hard-sigmoid constraint layer to restrict latent parameters into prior ranges while maintaining interpretability. We apply ScIReN on two tasks: simulating the flow of organic carbon through soils, and modeling ecosystem respiration from plants. On both tasks, ScIReN outperforms or matches black-box models in predictive accuracy, while greatly improving scientific interpretability -- it can infer latent scientific mechanisms and their relationships with input features.
