Table of Contents
Fetching ...

Physics-constrained symbolic regression for discovering closed-form equations of multimodal water retention curves from experimental data

Yejin Kim, Hyoung Suk Suh

TL;DR

This work introduces a physics-constrained machine learning framework designed for meta-modeling, enabling the automatic discovery of closed-form mathematical expressions for multimodal water retention curves directly from experimental data.

Abstract

Modeling the unsaturated behavior of porous materials with multimodal pore size distributions presents significant challenges, as standard hydraulic models often fail to capture their complex, multi-scale characteristics. A common workaround involves superposing unimodal retention functions, each tailored to a specific pore size range; however, this approach requires separate parameter identification for each mode, which limits interpretability and generalizability, especially in data-sparse scenarios. In this work, we introduce a fundamentally different approach: a physics-constrained machine learning framework designed for meta-modeling, enabling the automatic discovery of closed-form mathematical expressions for multimodal water retention curves directly from experimental data. Mathematical expressions are represented as binary trees and evolved via genetic programming, while physical constraints are embedded into the loss function to guide the symbolic regressor toward solutions that are physically consistent and mathematically robust. Our results demonstrate that the proposed framework can discover closed-form equations that effectively represent the water retention characteristics of porous materials with varying pore structures. To support third-party validation, application, and extension, we make the full implementation publicly available in an open-source repository.

Physics-constrained symbolic regression for discovering closed-form equations of multimodal water retention curves from experimental data

TL;DR

This work introduces a physics-constrained machine learning framework designed for meta-modeling, enabling the automatic discovery of closed-form mathematical expressions for multimodal water retention curves directly from experimental data.

Abstract

Modeling the unsaturated behavior of porous materials with multimodal pore size distributions presents significant challenges, as standard hydraulic models often fail to capture their complex, multi-scale characteristics. A common workaround involves superposing unimodal retention functions, each tailored to a specific pore size range; however, this approach requires separate parameter identification for each mode, which limits interpretability and generalizability, especially in data-sparse scenarios. In this work, we introduce a fundamentally different approach: a physics-constrained machine learning framework designed for meta-modeling, enabling the automatic discovery of closed-form mathematical expressions for multimodal water retention curves directly from experimental data. Mathematical expressions are represented as binary trees and evolved via genetic programming, while physical constraints are embedded into the loss function to guide the symbolic regressor toward solutions that are physically consistent and mathematically robust. Our results demonstrate that the proposed framework can discover closed-form equations that effectively represent the water retention characteristics of porous materials with varying pore structures. To support third-party validation, application, and extension, we make the full implementation publicly available in an open-source repository.
Paper Structure (13 sections, 13 equations, 13 figures, 1 table, 1 algorithm)

This paper contains 13 sections, 13 equations, 13 figures, 1 table, 1 algorithm.

Figures (13)

  • Figure 1: Schematic illustration of the data mapping procedure. Black hollow symbols represent exemplary water retention data, with the black axes denoting the original $(s, S_w)$ space and the red axes indicating the mapped $(s^*, S^*_w)$ space. The upper left corner of the red box corresponds to the reference point $(s_\text{min}, S_{w,\text{max}})$ and the lower right corner to $(s_\text{res}, S_{w,\text{min}})$ in the original space, which map to $(0,1)$ and $(1,0)$, respectively, in the transformed space. This mapping ensures that the learned function (red curve) resides within $s^* \in [0,1]$ and $S^*_w \in [0, 1]$.
  • Figure 2: Exemplary symbolic equation represented as a binary tree. This tree represents an expression $(1 + x)\ln(x) - \exp(y) + 0.5$, with a depth of 4 and a total size of 12 nodes.
  • Figure 3: Illustration of (a) mutation and (b) crossover operations in an evolutionary symbolic regression algorithm.
  • Figure 4: The symbolic model discovered from experimental unimodal water retention data using the proposed approach, compared against the van1980closed model, vanilla SR, and PCSR without mode constraint.
  • Figure 5: Data and physics loss components and the number of modes exhibited by PCSR with $\mathcal{L}_\text{mode}$ at varying complexity levels that optimally fit the data presented in Figure \ref{['fig:unimodal_wrc']}.
  • ...and 8 more figures