Table of Contents
Fetching ...

Nested Sampling for ARIMA Model Selection in Astronomical Time-Series Analysis

Ajinkya Naik, Will Handley

TL;DR

The paper presents a Bayesian, Nested Sampling approach to ARIMA model selection for astronomical time series, enabling simultaneous estimation of model evidences and posterior parameter distributions across grids of $(p,d,q)$ orders. By enforcing stationarity/invertibility and using a vectorized, GPU-friendly implementation, the method provides robust model comparison with an intrinsic Occam penalty and practical predictive capabilities. Validations on simulated data, sunspot records, and Kepler/TESS lightcurves demonstrate that the framework correctly identifies the best-fitting models and yields well-constrained posteriors, offering a rigorous alternative to traditional information criteria. The work highlights the potential of Bayesian model selection in time-domain astronomy and outlines avenues for extensions to seasonal, exogenous, and fractional-integral variants, with attention to computational efficiency.

Abstract

The upcoming era of large scale, high cadence astronomical surveys demands efficient and robust methods for time series analysis. ARIMA models provide a versatile parametric description of stochastic variability in this context. However, their practical use is limited by the challenge of selecting optimal model orders while avoiding overfitting. We present a novel solution to this problem using a Bayesian framework for time series modelling in astronomy by combining Autoregressive Integrated Moving Average (ARIMA) models with the Nested Sampling algorithm. Our method yields Bayesian evidences for model comparison and also incorporates an intrinsic Occam's penalty for unnecessary model complexity. A vectorized ARIMA Nested Sampling framework is implemented allowing us to perform model selection across grids of Autoregressive (AR) and Moving Average (MA) orders, with efficient inference of selected model parameters. The method is validated on simulated and real astronomical time series, including the yearly sunspots number record, Kepler Lightcurves data of the red giant KIC 12008916, and TESS photometry of the exoplanet host star Ross 176. In all cases, the algorithm correctly identified the true or best-fitting model while simultaneously yielding well constrained posterior distributions for the model parameters. Our results demonstrate that Nested Sampling offers a potentially rigorous alternative to autoregressive model selection in astronomical time series analysis.

Nested Sampling for ARIMA Model Selection in Astronomical Time-Series Analysis

TL;DR

The paper presents a Bayesian, Nested Sampling approach to ARIMA model selection for astronomical time series, enabling simultaneous estimation of model evidences and posterior parameter distributions across grids of orders. By enforcing stationarity/invertibility and using a vectorized, GPU-friendly implementation, the method provides robust model comparison with an intrinsic Occam penalty and practical predictive capabilities. Validations on simulated data, sunspot records, and Kepler/TESS lightcurves demonstrate that the framework correctly identifies the best-fitting models and yields well-constrained posteriors, offering a rigorous alternative to traditional information criteria. The work highlights the potential of Bayesian model selection in time-domain astronomy and outlines avenues for extensions to seasonal, exogenous, and fractional-integral variants, with attention to computational efficiency.

Abstract

The upcoming era of large scale, high cadence astronomical surveys demands efficient and robust methods for time series analysis. ARIMA models provide a versatile parametric description of stochastic variability in this context. However, their practical use is limited by the challenge of selecting optimal model orders while avoiding overfitting. We present a novel solution to this problem using a Bayesian framework for time series modelling in astronomy by combining Autoregressive Integrated Moving Average (ARIMA) models with the Nested Sampling algorithm. Our method yields Bayesian evidences for model comparison and also incorporates an intrinsic Occam's penalty for unnecessary model complexity. A vectorized ARIMA Nested Sampling framework is implemented allowing us to perform model selection across grids of Autoregressive (AR) and Moving Average (MA) orders, with efficient inference of selected model parameters. The method is validated on simulated and real astronomical time series, including the yearly sunspots number record, Kepler Lightcurves data of the red giant KIC 12008916, and TESS photometry of the exoplanet host star Ross 176. In all cases, the algorithm correctly identified the true or best-fitting model while simultaneously yielding well constrained posterior distributions for the model parameters. Our results demonstrate that Nested Sampling offers a potentially rigorous alternative to autoregressive model selection in astronomical time series analysis.

Paper Structure

This paper contains 18 sections, 30 equations, 18 figures, 6 tables.

Figures (18)

  • Figure 1: Artificially generated AR(2) time-series of $300$ data points with $\phi_1=0.6$ and $\phi_2=0.3$. A constant intercept term of $c=1.5$ and a standard deviation of $\sigma=1.0$ associated to $\epsilon_t$ was used.
  • Figure 2: Heatmap of the model log posterior probabilities $P_i$ for simulated AR$(2)$ time-series (Figure \ref{['fig:ar2_artificial']}). A hot-spot is observed at ARIMA$(2,0,0)$. The log posterior probabilities level off for higher orders as expected due to the action of Occam's penalty factor.
  • Figure 3: Posterior distributions of the AR(2) model parameters inferred from the simulated AR$(2)$ time-series (Figure \ref{['fig:ar2_artificial']}). The 1-D kernel density estimates show well-constrained posterior densities centred near the true parameter values (indicated by the black dashed lines), thus demonstrating good recovery of the underlying process dynamics.
  • Figure 4: Artificially generated ARMA($1,1$) process of 490 data points with a linear trend. The ARMA coefficients are chosen to be $\phi_1=0.6$ and $\theta=-0.4$. The constant intercept term and standard deviation are $c=2$ and $\sigma=1$, respectively.
  • Figure 5: Yearly Sunspots Number Data from 1700 to 2008
  • ...and 13 more figures