Surrogate-Accelerated Bayesian Inversion for Exoplanet Interior Characterization
Tijn De Wringer, Caroline Dorn, Emily O. Garvin, Stefano Marelli
TL;DR
The paper tackles the computational bottleneck in Bayesian exoplanet interior inference caused by expensive forward models. It introduces a surrogate-accelerated framework using polynomial chaos-Kriging (PCK) trained on a manageable number of forward-model evaluations and embedded within an Adaptive Metropolis MCMC to achieve significant speedups while maintaining rigorous uncertainty quantification. Validation across 1,000 synthetic cases and real targets like TOI-270 d demonstrates high surrogate fidelity (R^2 > 0.99) and credible intervals with near-nominal coverage, with speedups of approximately 2–3 orders of magnitude (days to minutes). The approach enables population-scale interior studies, rapid adaptation to model updates, and seamless integration of atmospheric constraints from upcoming missions, marking a substantial advance in exoplanet interior characterization.
Abstract
Characterizing the interior structure of exoplanets is an inverse problem often solved using Bayesian inference, but this approach is hampered by the high computational cost of planetary structure models. To overcome this barrier, we present a robust framework that accelerates inference by replacing the computationally expensive physics-based forward model with a fast polynomial chaos-Kriging (PCK) surrogate directly within a Markov chain Monte Carlo (MCMC) sampling loop. We rigorously validate our approach using a suite of tests, including a direct comparison against a benchmark MCMC inference using the full forward model, and a large-scale coverage study with 1000 synthetic test cases to demonstrate the statistical reliability of our inferred credible intervals. Our surrogate-assisted framework achieves a computational speedup of over 2 orders of magnitude (factor of $\sim$320), reducing single-CPU inference times from days to minutes. This efficiency is achieved with a surrogate that requires only a few hundred forward model evaluations for training \rev{for a single planet}. This data efficiency provides significant flexibility for model developments and a clear advantage over common machine learning approaches, which typically demand vast training sets ($>10^6$ model runs) and intensive pre-computation. The PCK surrogate maintains high fidelity with $R^2 > 0.99$ for most scenarios, and root-mean-square errors typically an order of magnitude smaller than observational uncertainties. This efficiency enables large scale population studies while preserving statistical robustness, which is computationally impractical with traditional methods.
