Table of Contents
Fetching ...

TabPFN Extensions for Interpretable Geotechnical Modelling

Taiga Saito, Yu Otake, Daijiro Mizutani, Stephen Wu

Abstract

Geotechnical site characterisation relies on sparse, heterogeneous borehole data where uncertainty quantification and model interpretability are as critical as predictive accuracy for reliable engineering decisions. This paper presents an exploratory investigation into the use of TabPFN, a transformer-based tabular foundation model using in-context learning, and its extension library tabpfn-extensions for two geotechnical inference tasks: (1) soil-type classification using N-value and shear-wave velocity data from a synthetic geotechnical dataset, and (2) iterative imputation of five missing mechanical parameters ($s_\mathrm{u}$, $E_{\mathrm{u}}$, ${σ'}_\mathrm{p}$, $C_\mathrm{c}$, $C_\mathrm{v}$) in benchmark problem BM/AirportSoilProperties/2/2025. We apply cosine-similarity analysis to TabPFN-derived embeddings, visualise full posterior distributions from an iterative inference procedure, and compute SHAP-based feature importance, all without model retraining. Learned embeddings clearly separate Clay and Sand samples without explicit soil-type supervision; iterative imputation improves predictions for four of five target parameters, with posterior widths that reflect physically reasonable parameter-specific uncertainty; and SHAP analysis reveals the inter-parameter dependency structure, recovering established geotechnical relationships including the Skempton compression index correlation and the inverse dependence of preconsolidation pressure on water content. These results suggest the potential of foundation-model-based tools to support interpretable, uncertainty-aware parameter inference in data-scarce geotechnical practice.

TabPFN Extensions for Interpretable Geotechnical Modelling

Abstract

Geotechnical site characterisation relies on sparse, heterogeneous borehole data where uncertainty quantification and model interpretability are as critical as predictive accuracy for reliable engineering decisions. This paper presents an exploratory investigation into the use of TabPFN, a transformer-based tabular foundation model using in-context learning, and its extension library tabpfn-extensions for two geotechnical inference tasks: (1) soil-type classification using N-value and shear-wave velocity data from a synthetic geotechnical dataset, and (2) iterative imputation of five missing mechanical parameters (, , , , ) in benchmark problem BM/AirportSoilProperties/2/2025. We apply cosine-similarity analysis to TabPFN-derived embeddings, visualise full posterior distributions from an iterative inference procedure, and compute SHAP-based feature importance, all without model retraining. Learned embeddings clearly separate Clay and Sand samples without explicit soil-type supervision; iterative imputation improves predictions for four of five target parameters, with posterior widths that reflect physically reasonable parameter-specific uncertainty; and SHAP analysis reveals the inter-parameter dependency structure, recovering established geotechnical relationships including the Skempton compression index correlation and the inverse dependence of preconsolidation pressure on water content. These results suggest the potential of foundation-model-based tools to support interpretable, uncertainty-aware parameter inference in data-scarce geotechnical practice.
Paper Structure (17 sections, 7 figures, 3 tables)

This paper contains 17 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Scatter plots of training (circles) and test (stars) data in the $N$ -- $V_\mathrm{s}$ space, coloured by soil class (Clay: blue, Sand: orange). Training and test sets each contain 16 samples. Reference curves $V_\mathrm{s} = 100N^{1/3}$ and $V_\mathrm{s} = 80N^{1/3}$ from the Japanese railway seismic design standard RTRI2012 are overlaid; Clay samples follow the upper curve and Sand samples the lower curve.
  • Figure 2: Predicted probability of Sand class $P(\text{Sand})$ over the $N$ -- $V_\mathrm{s}$ domain, with the training samples overlaid (circles). The bold contour marks the decision boundary at $P(\text{Sand}) = 0.5$; grey contours indicate the 0.1, 0.25, 0.75, and 0.9 levels.
  • Figure 3: Cosine similarity heatmap of TabPFN embeddings between test and training samples (axis labels correspond to Table \ref{['tab:rtri_data']} No.). The block-diagonal structure confirms that the model internally separates Clay and Sand samples. Test sample No. 8 (first Sand test sample; $N = 14$, $V_\mathrm{s} = 193$ m/s) shows visibly lower similarity with the training Sand block, reflecting the model's reduced confidence at this out-of-distribution boundary point.
  • Figure 4: Normalised RMSE (RMSE / RMSE at iteration 1) per iteration for each mechanical parameter. Values below 1.0 indicate improvement over the initial estimate.
  • Figure 5: Posterior distributions at iteration 10 for all five mechanical parameters across test samples. Violin width represents the probability density; black bar = median; yellow circle = true value; black filled circle = observed value (parameter was not missing for that sample and therefore not predicted).
  • ...and 2 more figures