Table of Contents
Fetching ...

Interpolation-Driven Machine Learning Approaches for Plume Shine Dose Estimation: A Comparison of XGBoost, Random Forest, and TabNet

Biswajit Sadhu, Kalpak Gupte, Trijit Sadhu, S. Anand

TL;DR

This work addresses the need for fast, reliable plume shine dose estimation in safety-critical radiological contexts where physics-based calculations are too slow. It introduces an interpolation-assisted ML framework that densifies sparse analytical dose tables via shape-preserving PCHIP interpolation and compares RF, XGBoost, and TabNet on 17 radionuclides across dispersion scenarios. The study shows that high-resolution training data markedly improves generalization, with XGBoost delivering the best overall accuracy ($R^2$ near 1 and $MAPE$ often below 1-3%), while TabNet benefits from dense data but lags behind tree ensembles due to its attention-based inductive bias. Interpretable analyses (permutation importance and TabNet attention) reveal that geometry–dispersion features dominate predictions and explain the observed performance hierarchy, and a Streamlit GUI provides practical deployment and scenario exploration for radiological decision support.

Abstract

Despite the success of machine learning (ML) in surrogate modeling, its use in radiation dose assessment is limited by safety-critical constraints, scarce training-ready data, and challenges in selecting suitable architectures for physics-dominated systems. Within this context, rapid and accurate plume shine dose estimation serves as a practical test case, as it is critical for nuclear facility safety assessment and radiological emergency response, while conventional photon-transport-based calculations remain computationally expensive. In this work, an interpolation-assisted ML framework was developed using discrete dose datasets generated with the pyDOSEIA suite for 17 gamma-emitting radionuclides across varying downwind distances, release heights, and atmospheric stability categories. The datasets were augmented using shape-preserving interpolation to construct dense, high-resolution training data. Two tree-based ML models (Random Forest and XGBoost) and one deep learning (DL) model (TabNet) were evaluated to examine predictive performance and sensitivity to dataset resolution. All models showed higher prediction accuracy with the interpolated high-resolution dataset than with the discrete data; however, XGBoost consistently achieved the highest accuracy. Interpretability analysis using permutation importance (tree-based models) and attention-based feature attribution (TabNet) revealed that performance differences stem from how the models utilize input features. Tree-based models focus mainly on dominant geometry-dispersion features (release height, stability category, and downwind distance), treating radionuclide identity as a secondary input, whereas TabNet distributes attention more broadly across multiple variables. For practical deployment, a web-based GUI was developed for interactive scenario evaluation and transparent comparison with photon-transport reference calculations.

Interpolation-Driven Machine Learning Approaches for Plume Shine Dose Estimation: A Comparison of XGBoost, Random Forest, and TabNet

TL;DR

This work addresses the need for fast, reliable plume shine dose estimation in safety-critical radiological contexts where physics-based calculations are too slow. It introduces an interpolation-assisted ML framework that densifies sparse analytical dose tables via shape-preserving PCHIP interpolation and compares RF, XGBoost, and TabNet on 17 radionuclides across dispersion scenarios. The study shows that high-resolution training data markedly improves generalization, with XGBoost delivering the best overall accuracy ( near 1 and often below 1-3%), while TabNet benefits from dense data but lags behind tree ensembles due to its attention-based inductive bias. Interpretable analyses (permutation importance and TabNet attention) reveal that geometry–dispersion features dominate predictions and explain the observed performance hierarchy, and a Streamlit GUI provides practical deployment and scenario exploration for radiological decision support.

Abstract

Despite the success of machine learning (ML) in surrogate modeling, its use in radiation dose assessment is limited by safety-critical constraints, scarce training-ready data, and challenges in selecting suitable architectures for physics-dominated systems. Within this context, rapid and accurate plume shine dose estimation serves as a practical test case, as it is critical for nuclear facility safety assessment and radiological emergency response, while conventional photon-transport-based calculations remain computationally expensive. In this work, an interpolation-assisted ML framework was developed using discrete dose datasets generated with the pyDOSEIA suite for 17 gamma-emitting radionuclides across varying downwind distances, release heights, and atmospheric stability categories. The datasets were augmented using shape-preserving interpolation to construct dense, high-resolution training data. Two tree-based ML models (Random Forest and XGBoost) and one deep learning (DL) model (TabNet) were evaluated to examine predictive performance and sensitivity to dataset resolution. All models showed higher prediction accuracy with the interpolated high-resolution dataset than with the discrete data; however, XGBoost consistently achieved the highest accuracy. Interpretability analysis using permutation importance (tree-based models) and attention-based feature attribution (TabNet) revealed that performance differences stem from how the models utilize input features. Tree-based models focus mainly on dominant geometry-dispersion features (release height, stability category, and downwind distance), treating radionuclide identity as a secondary input, whereas TabNet distributes attention more broadly across multiple variables. For practical deployment, a web-based GUI was developed for interactive scenario evaluation and transparent comparison with photon-transport reference calculations.
Paper Structure (27 sections, 7 equations, 11 figures, 3 tables)

This paper contains 27 sections, 7 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Schematic representation of plume shine exposure geometry, illustrating radionuclide release from an elevated source, atmospheric transport of the radioactive cloud, and external gamma dose received by a downwind receptor as a function of release height and downwind distance. Although not explicitly shown, the atmospheric stability category governs plume dispersion characteristics and significantly influences the resulting plume shine dose.
  • Figure 2: Distribution of log$_{10}$ plume shine dose across radionuclides (left) and atmospheric stability categories A–F (right). The boxen plots show median, spread, and tail behavior, with stability classes A–F representing Pasquill–Gifford regimes from unstable to stable
  • Figure 3: Distance-wise comparison between PCHIP-interpolated plume shine dose and numerically calculated ground-truth values for representative radionuclides under stability category A at a release height of 140 m. The agreement demonstrates the shape-preserving and monotonic behavior of the interpolation method.
  • Figure 4: Distance-wise comparison between PCHIP-interpolated plume shine dose and numerically calculated ground-truth values for representative radionuclides under stability category F at a release height of 140 m. The agreement demonstrates the shape-preserving and monotonic behavior of the interpolation method.
  • Figure 5: Distance-wise validation of PCHIP interpolation for plume shine dose of ^137Cs at a release height of 140 m. The figure compares numerically calculated dose values and PCHIP-interpolated profiles across Pasquill–Gifford stability categories A–F, demonstrating that the interpolation preserves the physically expected monotonic decay with distance without introducing non-physical oscillations.
  • ...and 6 more figures