Table of Contents
Fetching ...

Kernel-, mean- and noise-marginalised Gaussian processes for exoplanet transits and $H_0$ inference

Namu Kroupa, David Yallup, Will Handley, Michael Hobson

TL;DR

The paper presents a fully Bayesian Gaussian Process regression framework that marginalises over kernel choice and hyperparameters, using transdimensional sampling and nested sampling to compute kernel evidences $Z_k$ and kernel posteriors $p_k$. This approach enables direct kernel comparison, principled marginalisation of kernel uncertainty, and robust inference for both exoplanet transit signals and cosmological $H_0$ measurements, including consideration of mean and noise models. Key findings show that kernel marginalisation recovers the true kernel in favorable (high-SNR) regimes, reduces bias in mean-function hyperparameters, and yields $H_0$ values consistent with external benchmarks within uncertainties, while avoiding the inductive bias of pre-selected kernels. The method is demonstrated on synthetic transit data and real cosmic chronometer plus BAO data, with kernel posteriors guiding interpretation and offering a path toward more reliable GP analyses in astrophysics and cosmology.

Abstract

Using a fully Bayesian approach, Gaussian Process regression is extended to include marginalisation over the kernel choice and kernel hyperparameters. In addition, Bayesian model comparison via the evidence enables direct kernel comparison. The calculation of the joint posterior was implemented with a transdimensional sampler which simultaneously samples over the discrete kernel choice and their hyperparameters by embedding these in a higher-dimensional space, from which samples are taken using nested sampling. Kernel recovery and mean function inference were explored on synthetic data from exoplanet transit light curve simulations. Subsequently, the method was extended to marginalisation over mean functions and noise models and applied to the inference of the present-day Hubble parameter, $H_0$, from real measurements of the Hubble parameter as a function of redshift, derived from the cosmologically model-independent cosmic chronometer and $Λ$CDM-dependent baryon acoustic oscillation observations. The inferred $H_0$ values from the cosmic chronometers, baryon acoustic oscillations and combined datasets are $H_0= 66 \pm 6\, \mathrm{km}\,\mathrm{s}^{-1}\,\mathrm{Mpc}^{-1}$, $H_0= 67 \pm 10\, \mathrm{km}\,\mathrm{s}^{-1}\,\mathrm{Mpc}^{-1}$ and $H_0= 69 \pm 6\, \mathrm{km}\,\mathrm{s}^{-1}\,\mathrm{Mpc}^{-1}$, respectively. The kernel posterior of the cosmic chronometers dataset prefers a non-stationary linear kernel. Finally, the datasets are shown to be not in tension with $\ln R=12.17\pm 0.02$.

Kernel-, mean- and noise-marginalised Gaussian processes for exoplanet transits and $H_0$ inference

TL;DR

The paper presents a fully Bayesian Gaussian Process regression framework that marginalises over kernel choice and hyperparameters, using transdimensional sampling and nested sampling to compute kernel evidences and kernel posteriors . This approach enables direct kernel comparison, principled marginalisation of kernel uncertainty, and robust inference for both exoplanet transit signals and cosmological measurements, including consideration of mean and noise models. Key findings show that kernel marginalisation recovers the true kernel in favorable (high-SNR) regimes, reduces bias in mean-function hyperparameters, and yields values consistent with external benchmarks within uncertainties, while avoiding the inductive bias of pre-selected kernels. The method is demonstrated on synthetic transit data and real cosmic chronometer plus BAO data, with kernel posteriors guiding interpretation and offering a path toward more reliable GP analyses in astrophysics and cosmology.

Abstract

Using a fully Bayesian approach, Gaussian Process regression is extended to include marginalisation over the kernel choice and kernel hyperparameters. In addition, Bayesian model comparison via the evidence enables direct kernel comparison. The calculation of the joint posterior was implemented with a transdimensional sampler which simultaneously samples over the discrete kernel choice and their hyperparameters by embedding these in a higher-dimensional space, from which samples are taken using nested sampling. Kernel recovery and mean function inference were explored on synthetic data from exoplanet transit light curve simulations. Subsequently, the method was extended to marginalisation over mean functions and noise models and applied to the inference of the present-day Hubble parameter, , from real measurements of the Hubble parameter as a function of redshift, derived from the cosmologically model-independent cosmic chronometer and CDM-dependent baryon acoustic oscillation observations. The inferred values from the cosmic chronometers, baryon acoustic oscillations and combined datasets are , and , respectively. The kernel posterior of the cosmic chronometers dataset prefers a non-stationary linear kernel. Finally, the datasets are shown to be not in tension with .
Paper Structure (27 sections, 28 equations, 11 figures, 5 tables)

This paper contains 27 sections, 28 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Example synthetic dataset for $N_\mathrm{data}=750$ and $\log_{10}(\mathrm{SNR})=1$. The green curve is the true mean function calculated from an exoplanet transit light curve simulation. Adding noise from an M32 kernel, the data points are obtained. The red and blue curves are the mean function predictive distributions, marginalised over the kernel posterior and conditioned on the M32 kernel, respectively. The shaded regions are one-sigma error bands.
  • Figure 2: Plot showing that the inference of the true kernel is correlated with the noise level and the number of data points. A measure of the sharpness of the kernel posterior at the M32 kernel, $S=p_\text{M32}-\langle p_k \rangle$, is plotted against the signal-to-noise ratio, $\log_{10}(\text{SNR})$, and the number of data points, $N_\text{data}$, of synthetic datasets created from an exoplanet simulation (Section \ref{['sec:synthetic-data-method']}) and correlated noise from an M32 kernel. Each coloured square corresponds to a dataset and the inscribed upper value is the plotted value of $S$, and the lower text is the maximum a posteriori (MAP) kernel. In the region within the red border, the true M32 kernel maximises the posterior. For $N_\text{data}=225$, $450$ and $675$, the M52 kernel has a similar or larger posterior probability than the M32 kernel even for $\log_{10}(\text{SNR})\approx 1$. The complete kernel posteriors for the datasets marked (a), (b) and (c) are shown in $\text{Figure}$\ref{['fig:kernel-posterior-examples']}.
  • Figure 3: Example kernel posteriors showing the inference of the true M32 kernel for the datasets marked (a), (b) and (c) in $\text{Figure}$\ref{['fig:contour-plot']}. In each plot, the red data point indicates the true kernel. (a) This is in the low data point, high noise region. The kernel posterior shows that multiple kernels are equally probable within one sigma. (b) This is in the medium data point, low noise region. The posterior favours the M52 and M72 kernels compared to the M32 kernel. (c) This is in the high data point, low noise region. The M32 kernel is correctly inferred.
  • Figure 4: Top: Plot of the true kernel, which is an M32 kernel with ${A_\mathrm{M32}=0.002}$ and $\ell_\mathrm{M32}=0.02\,\mathrm{days}$, and and M52 kernels with hyperparameters sampled from the posterior. The similarity metric $\Delta$ is calculated by summing the probability density of the M52 kernel samples along the curve of the true kernel. The settings for the plot are $N_\mathrm{data}=75$ and $\log_{10}(\mathrm{SNR})\approx 0.30$. Bottom: For each kernel, $\ln\Delta$ is shown. Larger $\Delta$ corresponds to higher similarity between the inferred and true kernel. It is seen that, for any $t_\mathrm{max}$, the true M32 kernel is not approximated well by the M32 kernel posterior compared to other kernels leading to a flat kernel posterior $p_k$.
  • Figure 5: Plot of the sharpness, $S$, when multiple occultations are included in the dataset and the mean function and kernels are fit to the data simultaneously. As the number of occultations in the dataset is increased, the lower left triangular region in which the kernel cannot be inferred shrinks while the upper right region in which the true kernel can be inferred grows. This is consistent with the results for a fixed mean function (Figure \ref{['fig:contour-plot']}) and shows that the method is robust for a free mean function when statistically independent realisations of an occultation are present in the dataset.
  • ...and 6 more figures