Table of Contents
Fetching ...

The global structure of the time delay likelihood

Namu Kroupa, Will Handley

Abstract

We identify a fundamental pathology in the likelihood for time delay inference which challenges standard inference methods. By analysing the likelihood for time delay inference with Gaussian process light curve models, we show that it generically develops a boundary-driven "W"-shape with a global maximum at the true delay and gradual rises towards the edges of the observation window. This arises because time delay estimation is intrinsically extrapolative. In practice, global samplers such as nested sampling are steered towards spurious edge modes unless strict convergence criteria are adopted. We demonstrate this with simulations and show that the effect strengthens with higher data density over a fixed time span. To ensure convergence, we provide concrete guidance, notably increasing the number of live points. Further, we show that methods implicitly favouring small delays, for example optimisers and local MCMC, induce a bias towards larger $H_0$. Our results clarify failure modes and offer practical remedies for robust fully Bayesian time delay inference.

The global structure of the time delay likelihood

Abstract

We identify a fundamental pathology in the likelihood for time delay inference which challenges standard inference methods. By analysing the likelihood for time delay inference with Gaussian process light curve models, we show that it generically develops a boundary-driven "W"-shape with a global maximum at the true delay and gradual rises towards the edges of the observation window. This arises because time delay estimation is intrinsically extrapolative. In practice, global samplers such as nested sampling are steered towards spurious edge modes unless strict convergence criteria are adopted. We demonstrate this with simulations and show that the effect strengthens with higher data density over a fixed time span. To ensure convergence, we provide concrete guidance, notably increasing the number of live points. Further, we show that methods implicitly favouring small delays, for example optimisers and local MCMC, induce a bias towards larger . Our results clarify failure modes and offer practical remedies for robust fully Bayesian time delay inference.
Paper Structure (29 sections, 44 equations, 8 figures)

This paper contains 29 sections, 44 equations, 8 figures.

Figures (8)

  • Figure 1: Top: Data-averaged log-likelihood $\mathbb{E}_\mathbf{y}\log L$ against the time delay $\Delta t$. The true time delay for this figure is ${\Delta t_{\mathrm{true}}=10\,\mathrm{days}\xspace}$. The log-likelihood is locally highly oscillatory and globally follows a "W"-shape with a global maximum at the true time delay. Middle: The standard deviation $\sigma_{\log L}=\sqrt{\mathbb{V}_\mathbf{y}\log L}$ is negligible compared to the expectation. Bottom: Data-averaged regularised log-likelihood $\mathbb{E}_\mathbf{y}\log L_{\mathrm{reg}}$ formed by subtracting the trend favouring large time delays. Only the maxima in the vicinity of the true time delay remain.
  • Figure 2: Illustration of the gradual increase in $\log L$ as the time delay $\Delta t$ increases. Two synthetic data sets shown on the right are sampled from the distribution defined in Equation \ref{['eqn:time-delayed-GP-pair']} with true time delay $\Delta t=10\,\mathrm{days}\xspace$. Subsequently, the log-likelihood $\log L$ of the model is evaluated as a function of the time delay, shown on the left. The evaluation of $\log L$ is reformulated as visually comparing the predictive distribution $p(\mathbf{y}_2|\mathbf{y}_1)$ conditioned on $\mathbf{y}_1$ with the data set $\mathbf{y}_2$. This is shown in the sequence of plots on the right for different time delays. At the true time delay, the predictive distribution evidently matches $\mathbf{y}_2$. As the time delay is increased, the mean of $p(\mathbf{y}_2|\mathbf{y}_1)$ does not pass through the data $\mathbf{y}_2$ and $\log L$ decreases significantly. However, the increasingly treats the non-overlapping data as noise, which provides a good fit. As $\Delta t$ is increased further, more data are fit as noise and $\log L$ rises gradually, leading to the "W"-shape of $\log L$ (Figure \ref{['fig:data-averaged-loglikelihood']}).
  • Figure 3: Top: Data-averaged posterior for $\ell=10\,\mathrm{days}\xspace$. As $n_\mathrm{data}$ is increased, the central peak of the posterior becomes sharper around the true time delay, as expected. However, the posterior density also increases around the edges, $\pm t_\mathrm{range}$. Middle: Posterior time delays of each data set for $\ell=10\,\mathrm{days}\xspace$ and $n_\mathrm{data}=100$. The posterior is peaked at either the true time delay or $\pm t_\mathrm{range}$. Bottom: Standard deviation $\sigma_{\Delta t}$ of $\Delta t$ from the data-averaged posterior, for different values of $\ell$ and $n_\mathrm{data}$. The standard deviation is systematically around the same order of magnitude as $t_\mathrm{range}$, indicating that the inference of large time delays (as in the top subplot) are systematic across $\ell$ and $n_\mathrm{data}$. The error bars on each value indicate the variation of $\sigma_{\Delta t}$ between different data set realisations. Evidently, $\sigma_{\Delta t}$ can vary significantly between data sets.
  • Figure 4: Fraction of unconverged Nested Sampling (NS) and Sequential Monte Carlo (SMC) runs against number of live points and particles, respectively. As the number is increased, non-convergence becomes less probable since finding the central mode becomes more probable. Thus, the fraction decreases. Increasing the observation window $t_\mathrm{range}$ from $10^3\,\mathrm{days}\xspace$ to $10^4\,\mathrm{days}\xspace$ increases the probability of non-convergence as the central mode occupies a smaller fraction of parameter space and the gradual rise of the likelihood towards the edge modes drives the population of live points or particles further away from the central mode. Similar trends are seen for both NS and SMC. The plotted lines are only visual guides.
  • Figure 5: Data-averaged posteriors with reduced nested sampling convergence settings. Top: The data-averaged posterior of the time delay converges to a trimodal distribution as the density of data points increases: While the central peak at the true time delay is visible, two unphysical additional modes at the edges of the observation time interval emerge. Bottom: The unphysical modes are suppressed by increasing the effective sample size of the . This is achieved by decreasing the length scale by a factor of $10$. In both figures, the inference was performed on synthetic light curves generated with known time delay $\Delta t_\mathrm{true}$ and length scale $\ell$.
  • ...and 3 more figures