Redshift Classification of Optical Gamma-Ray Bursts using Supervised Learning
Milind Sarkar, Maria Giovanna Dainotti, Nikita S. Khatiya, Dhruv S. Bal, Malgorzata Bogdan, Ye Li, Agnieszka Pollo, Dieter H. Hartmann, Bing Zhang, Simanta Deka, Nissim Fraija, J. Xavier Prochaska
TL;DR
This study develops an optical plateau–based ensemble learning framework to classify GRBs by redshift, addressing spectroscopic incompleteness with rapid probabilistic predictions. The authors curate a dataset of 171 LGRBs with optical plateau measurements, applying rigorous preprocessing (M-estimator outlier removal, MICE imputation) and LASSO feature selection, then train a SuperLearner ensemble across multiple redshift thresholds. The best-performing model (raw data with M-estimator at $z_t=2.0$) achieves high discriminative power (AUC ≈ 0.841; TPR ≈ 0.741) and generalizes well to independent samples (accuracy ≈ 97%, AUC ≈ 0.9338), with a publicly available web app for real-time use. The optical classifier complements X-ray approaches, offering enhanced sensitivity to high-$z$ events while remaining robust to data incompleteness, and sets the stage for multi-wavelength redshift estimation and improved GRB cosmology.
Abstract
Gamma-ray bursts (GRBs) are among the most luminous explosions in the Universe and serve as powerful probes of the early cosmos. However, the rapid fading of their afterglows and the scarcity of spectroscopic measurements make photometric classification crucial for timely high-redshift identification. We present an ensemble machine learning framework for redshift classification of GRBs based solely on their optical plateau and prompt emission properties. Our dataset comprises 171 long GRBs observed by the Swift UVOT and more than 450 ground-based telescopes. The analysis pipeline integrates robust statistical techniques, including M-estimator outlier rejection, multivariate imputation using Multiple Imputation by Chained Equations, and Least Absolute Shrinkage and Selection Operator feature selection, followed by a SuperLearner ensemble combining parametric, semi-parametric, and non-parametric algorithms. The optimal model, trained on raw optical data with outlier removal at a redshift threshold of z equals 2.0, achieves a true positive rate of 74 percent and an area under the curve of 0.84, maintaining balanced generalization between training and test sets. At higher thresholds, such as z equals 3.0, the classifier sustains strong discriminative power with an area under the curve of 0.88. Validation on an independent GRB sample yields 97 percent overall accuracy, perfect specificity, and an ensemble area under the curve of 0.93. Compared to previous prompt- and X-ray-based classifiers, our optical framework offers enhanced sensitivity to high-redshift events, improved robustness against data incompleteness, and greater applicability to ground-based follow-up. We also publicly release a web application that enables real-time redshift classification, facilitating rapid identification of candidate high-redshift GRBs for cosmological studies.
