Interpolation-Driven Machine Learning Approaches for Plume Shine Dose Estimation: A Comparison of XGBoost, Random Forest, and TabNet
Biswajit Sadhu, Kalpak Gupte, Trijit Sadhu, S. Anand
TL;DR
This work addresses the need for fast, reliable plume shine dose estimation in safety-critical radiological contexts where physics-based calculations are too slow. It introduces an interpolation-assisted ML framework that densifies sparse analytical dose tables via shape-preserving PCHIP interpolation and compares RF, XGBoost, and TabNet on 17 radionuclides across dispersion scenarios. The study shows that high-resolution training data markedly improves generalization, with XGBoost delivering the best overall accuracy ($R^2$ near 1 and $MAPE$ often below 1-3%), while TabNet benefits from dense data but lags behind tree ensembles due to its attention-based inductive bias. Interpretable analyses (permutation importance and TabNet attention) reveal that geometry–dispersion features dominate predictions and explain the observed performance hierarchy, and a Streamlit GUI provides practical deployment and scenario exploration for radiological decision support.
Abstract
Despite the success of machine learning (ML) in surrogate modeling, its use in radiation dose assessment is limited by safety-critical constraints, scarce training-ready data, and challenges in selecting suitable architectures for physics-dominated systems. Within this context, rapid and accurate plume shine dose estimation serves as a practical test case, as it is critical for nuclear facility safety assessment and radiological emergency response, while conventional photon-transport-based calculations remain computationally expensive. In this work, an interpolation-assisted ML framework was developed using discrete dose datasets generated with the pyDOSEIA suite for 17 gamma-emitting radionuclides across varying downwind distances, release heights, and atmospheric stability categories. The datasets were augmented using shape-preserving interpolation to construct dense, high-resolution training data. Two tree-based ML models (Random Forest and XGBoost) and one deep learning (DL) model (TabNet) were evaluated to examine predictive performance and sensitivity to dataset resolution. All models showed higher prediction accuracy with the interpolated high-resolution dataset than with the discrete data; however, XGBoost consistently achieved the highest accuracy. Interpretability analysis using permutation importance (tree-based models) and attention-based feature attribution (TabNet) revealed that performance differences stem from how the models utilize input features. Tree-based models focus mainly on dominant geometry-dispersion features (release height, stability category, and downwind distance), treating radionuclide identity as a secondary input, whereas TabNet distributes attention more broadly across multiple variables. For practical deployment, a web-based GUI was developed for interactive scenario evaluation and transparent comparison with photon-transport reference calculations.
