Table of Contents
Fetching ...

Physics-Informed Neural Networks for Predicting Hydrogen Sorption in Geological Formations: Thermodynamically Constrained Deep Learning Integrating Classical Adsorption Theory

Mohammad Nooraiepour, Mohammad Masoudi, Zezhang Song, Helge Hellevang

Abstract

Accurate prediction of hydrogen sorption in fine-grained geological materials is essential for evaluating underground hydrogen storage capacity, assessing caprock integrity, and characterizing hydrogen migration in subsurface energy systems. Classical isotherm models perform well at the individual-sample level but fail when generalized across heterogeneous populations, with the coefficient of determination collapsing from 0.80-0.90 for single-sample fits to 0.09-0.38 for aggregated multi-sample datasets. We present a multi-scale physics-informed neural network framework that addresses this limitation by embedding classical adsorption theory and thermodynamic constraints directly into the learning process. The framework utilizes 1,987 hydrogen sorption isotherm measurements across clays, shales, coals, supplemented by 224 characteristic uptake measurements. A seven-category physics-informed feature engineering scheme generates 62 thermodynamically meaningful descriptors from raw material characterization data. The loss function enforces saturation limits, a monotonic pressure response, and Van't Hoff temperature dependence via penalty weighting, while a three-phase curriculum-based training strategy ensures stable integration of competing physical constraints. An architecture-diverse ensemble of ten members provides calibrated uncertainty quantification, with post-hoc temperature scaling achieving target prediction interval coverage. The optimized PINN achieves R2 = 0.9544, RMSE = 0.0484 mmol/g, and MAE = 0.0231 mmol/g on the held-out test set, with 98.6% monotonicity satisfaction and zero non-physical negative predictions. Physics-informed regularization yields a 10-15% cross-lithology generalization advantage over a well-tuned random forest under leave-one-lithology-out validation, confirming that thermodynamic constraints transfer meaningfully across geological boundaries.

Physics-Informed Neural Networks for Predicting Hydrogen Sorption in Geological Formations: Thermodynamically Constrained Deep Learning Integrating Classical Adsorption Theory

Abstract

Accurate prediction of hydrogen sorption in fine-grained geological materials is essential for evaluating underground hydrogen storage capacity, assessing caprock integrity, and characterizing hydrogen migration in subsurface energy systems. Classical isotherm models perform well at the individual-sample level but fail when generalized across heterogeneous populations, with the coefficient of determination collapsing from 0.80-0.90 for single-sample fits to 0.09-0.38 for aggregated multi-sample datasets. We present a multi-scale physics-informed neural network framework that addresses this limitation by embedding classical adsorption theory and thermodynamic constraints directly into the learning process. The framework utilizes 1,987 hydrogen sorption isotherm measurements across clays, shales, coals, supplemented by 224 characteristic uptake measurements. A seven-category physics-informed feature engineering scheme generates 62 thermodynamically meaningful descriptors from raw material characterization data. The loss function enforces saturation limits, a monotonic pressure response, and Van't Hoff temperature dependence via penalty weighting, while a three-phase curriculum-based training strategy ensures stable integration of competing physical constraints. An architecture-diverse ensemble of ten members provides calibrated uncertainty quantification, with post-hoc temperature scaling achieving target prediction interval coverage. The optimized PINN achieves R2 = 0.9544, RMSE = 0.0484 mmol/g, and MAE = 0.0231 mmol/g on the held-out test set, with 98.6% monotonicity satisfaction and zero non-physical negative predictions. Physics-informed regularization yields a 10-15% cross-lithology generalization advantage over a well-tuned random forest under leave-one-lithology-out validation, confirming that thermodynamic constraints transfer meaningfully across geological boundaries.

Paper Structure

This paper contains 85 sections, 23 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Classical isotherm model-fitting performance across individual geological samples of clays, shales, and coals. Nine classical models were fitted per isotherm. The best model was selected via the lowest AIC. High per-sample accuracy shows effective capture of local adsorption, yet fails to generalise across samples or aggregates (see Fig. \ref{['fig:aggregated_failure']}). (a) Best-fit model $R^2$ distributions by lithology (violin plots with interquartile boxes and data points). Dashed lines mark thresholds: $R^2 = 0.95$ (excellent), $0.85$ (good), $0.70$ (acceptable). (b) Model preference by lithology (% of samples with lowest AIC). Sips model dominates overall (59.4% of best fits). (c) Sample quality tiers by lithology: Excellent ($R^2 > 0.95$), Good ($0.85$--$0.95$), Fair ($0.70$--$0.85$), Poor ($R^2 < 0.70$). Percentages show tier shares within each lithology. (d) Sips model parameter distributions: maximum capacity $q_{\max}$ (mmol g$^{-1}$) and heterogeneity exponent $n_s$ (reference line at $n_s = 1$, Langmuir limit). (e) Gibbs free energy of adsorption ($\Delta G$, kJ mol$^{-1}$) by lithology, derived from isosteric estimates. Shaded bands indicate physisorption strength: weak (0 to $-10$), moderate ($-10$ to $-20$), strong ($< -20$).
  • Figure 2: Failure of classical isotherm models in aggregated, cross-sample generalisation. While individual-sample fits achieve median $R^2 = 0.97$ (Fig. \ref{['fig:individual_fitting']}), aggregating data within each lithology and re-fitting yields 80--90% mean reduction in $R^2$, motivating the physics-informed neural network approach. (a) Comparison of individual-sample $R^2$, aggregated $R^2$, and cross-validation $R^2$ by lithology and overall. Percentage reductions annotate performance collapse. (b) Aggregated $R^2$ heatmap for nine classical models plus fourth-order polynomial across four groupings (clays, shales, coals, all combined). Colour scale 0--0.45; best model per group highlighted (bold border). (c) Bootstrap parameter uncertainty (95% CI error bars) for dominant model per lithology: Redlich--Peterson (clays), Sips (shales), Langmuir (coals). Fold-width annotations (e.g. $\times 10^{2}$) indicate parameter instability. (d) Five-fold cross-validation $R^2$ by lithology. Shale folds reach negative values, showing worse-than-mean performance. (e) Residual bias across six pressure bins ($<$10 to $>$200 bar). S-shaped pattern (under-prediction at low/high pressures, over-prediction at intermediate) reveals structural misspecification. Durbin--Watson statistics (1.18--1.37) confirm positive residual autocorrelation.
  • Figure 3: Property--uptake relationships for hydrogen sorption across 224 measurements in clays ($n=123$), shales ($n=39$), and coals ($n=62$). Pearson correlation analysis revealed 23 significant relationships ($|r| > 0.3$, $p < 0.05$; 2 clay, 12 shale, 9 coal), with bootstrap resampling (1000 iterations) for uncertainty quantification. Results highlight lithology-specific controls --- surface area in clays, organic content in shales, and rank-dependent aromaticity in coals --- motivating the lithology-branched PINN architecture (Section \ref{['subsec:pinn_architecture_main']}). (a) Hydrogen uptake distributions by lithology (violin plots with interquartile boxes, medians, and data points). Coals showed highest mean uptake ($0.378 \pm 0.205$ mmol g$^{-1}$, CV 54%), shales the widest range ($0.282 \pm 0.450$ mmol g$^{-1}$), and clays the greatest relative variability ($0.232 \pm 0.276$ mmol g$^{-1}$, CV 119%). (b) Significant property--uptake correlations ($|r| > 0.3$, $p < 0.05$), grouped by lithology and sorted by decreasing $|r|$. Bar color indicates sign (green = positive, red = negative); values show Pearson $r$ and valid $n$. (c) Bootstrap 95% confidence intervals for the three strongest correlations per lithology (1000 iterations). Wide CI for coal micropore volume ($r = 0.650$, 95% CI [0.118, 0.894], $n=13$) reflects data sparsity limiting model performance. (d) Dominant driver of uptake for each lithology with linear regression and 95% bootstrap confidence bands (500 iterations): specific surface area (clays, $r=0.598$, $n=121$), total organic carbon (shales, $r=0.755$, $n=39$), fixed carbon (coals, $r=0.711$, $n=48$). (e) Number of positive ($r > 0.3$) and negative ($r < -0.3$) significant correlations by lithology. Shales show the most diverse pattern (12 total: 7 positive, 5 negative), clays only positive (2 total), and coals predominantly positive (9 total: 6 positive, 3 negative).
  • Figure 4: Supervised predictive modelling and Random Forest feature importance for hydrogen uptake across three lithologies, providing performance baselines and feature priorities for PINN input layer design. Three model classes were tested --- linear regression, Ridge regression ($\alpha=1.0$), and Random Forest (100 estimators, max depth 5), using 5-fold cross-validation. (a) Five-fold cross-validation $R^2$ (mean $\pm$ SD) by model and lithology (grouped bars with per-fold error bars). Red zone indicates worse-than-mean performance. Ridge markedly improves shale $R^2$ from $-1.130$ (linear) to $0.551$ due to multicollinearity. Random Forest yields the highest performance overall (clay CV $R^2=0.523$; coal CV $R^2=0.352$). (b) Random Forest feature importance (mean decrease in impurity) by lithology (stacked horizontal bars). Values $\geq 5\%$ labelled. Clays are dominated by specific surface area (80.9%). Shales show distributed importance led by TOC (52.7%), langite (15.4%), average pore diameter (13.5%), and berlinite (11.2%). Coals are led by vitrinite reflectance (%R$_o$, 23.2%) and volatile matter (22.0%). (c) Per-fold $R^2$ variability (box-and-strip plots) for each model--lithology combination. Red zone marks negative performance. Shale linear regression shows extreme instability (some folds $R^2 < -1.8$), while Random Forest remains consistently positive. (d) Property data completeness (%) vs. absolute Pearson $|r|$ for significant pairs ($p<0.05$, $|r|>0.2$), with point size $\propto \sqrt{n}$ and colour by lithology. Purple zone ($<$30% completeness) highlights coal micropore volume (21% complete, $|r|=0.650$), the strongest predictor yet least measured --- a key challenge for coal PINN development. (e) PINN input feature priorities by lithology (lollipop chart), coloured by mechanism: blue (structural: surface area, pores), green (chemical: organics, mineral proxies), orange (rank: coalification indicators). Clay branches prioritize surface area; shale branches require TOC and pore metrics; coal branches emphasize rank indicators (%R$_o$, volatile matter) plus texture.
  • Figure 5: Ensemble feature selection, bootstrap stability, cross-validation performance, and stratified dataset preparation. Four methods (Pearson correlation, mutual information regression, Random Forest importance, $F$-statistic) were applied to the scaled feature matrix; final selection used majority voting. (a) Top 20 features ranked by mean bootstrap importance (absolute Pearson $r$, 1000 iterations), shown as horizontal bars with 95% CI error bars. Colours indicate category (thermodynamic, pore structure, surface chemistry, interaction, kinetic, molecular sieving, classical model-derived), highlighting multi-domain predictive power. (b) Feature importance vs. bootstrap stability scatter: mean bootstrap importance ($x$-axis) vs. 95% CI width ($y$-axis, instability measure); point size reflects ensemble vote count (1--4). Green quadrant marks high-importance, low-instability features preferred for PINN input. Dashed lines show median thresholds. (c) Five-fold cross-validation performance of Random Forest baseline on selected features: $R^2$ (blue, left axis) and RMSE (mmol g$^{-1}$, orange, right axis) per fold. Dashed lines indicate cross-fold means. (d) Lithology-stratified train (70%), validation (15%), and test (15%) splits via StratifiedShuffleSplit. Stacked bars show sample counts per lithology; percentages confirm proportional representation across partitions. (e) Kernel density estimates of H$_2$ uptake for each split and overall dataset. Two-sample Kolmogorov--Smirnov test confirms statistically consistent target distributions between training and test splits.
  • ...and 4 more figures