Table of Contents
Fetching ...

Conditional variational autoencoders for cosmological model discrimination and anomaly detection in cosmic microwave background power spectra

Tian-Yang Sun, Tian-Nuo Li, He Wang, Jing-Fei Zhang, Xin Zhang

TL;DR

A parameter-conditioned variational autoencoder that aligns a data-driven latent representation with cosmological parameters while remaining compatible with standard likelihood analyses, enabling anomaly detection beyond beyond-Lambda-CDM scenarios and points to physically meaningful directions for refinement.

Abstract

The cosmic microwave background power spectra are a primary window into the early universe. However, achieving interpretable, likelihood-compatible compression and fast inference under weak model assumptions remains challenging. We propose a parameter-conditioned variational autoencoder (CVAE) that aligns a data-driven latent representation with cosmological parameters while remaining compatible with standard likelihood analyses. The model achieves high-fidelity compression of the $D_\ell^{TT}$, $D_\ell^{EE}$, and $D_\ell^{TE}$ spectra into just 5 latent dimensions, with reconstruction accuracy exceeding $99.9\%$ within Planck uncertainties. It reliably reconstructs spectra for beyond-$Λ$CDM scenarios, even under parameter extrapolation, and enables rapid inference, reducing the computation time from $\sim$40 hours to $\sim$2 minutes while maintaining posterior consistency. The learned latent space demonstrates a physically meaningful structure, capturing a distributed representation that mirrors known cosmological parameters and their degeneracies. Moreover, it supports highly effective unsupervised discrimination among cosmological models, achieving performance competitive with supervised approaches. Overall, this physics-informed CVAE enables anomaly detection beyond $Λ$CDM and points to physically meaningful directions for refinement.

Conditional variational autoencoders for cosmological model discrimination and anomaly detection in cosmic microwave background power spectra

TL;DR

A parameter-conditioned variational autoencoder that aligns a data-driven latent representation with cosmological parameters while remaining compatible with standard likelihood analyses, enabling anomaly detection beyond beyond-Lambda-CDM scenarios and points to physically meaningful directions for refinement.

Abstract

The cosmic microwave background power spectra are a primary window into the early universe. However, achieving interpretable, likelihood-compatible compression and fast inference under weak model assumptions remains challenging. We propose a parameter-conditioned variational autoencoder (CVAE) that aligns a data-driven latent representation with cosmological parameters while remaining compatible with standard likelihood analyses. The model achieves high-fidelity compression of the , , and spectra into just 5 latent dimensions, with reconstruction accuracy exceeding within Planck uncertainties. It reliably reconstructs spectra for beyond-CDM scenarios, even under parameter extrapolation, and enables rapid inference, reducing the computation time from 40 hours to 2 minutes while maintaining posterior consistency. The learned latent space demonstrates a physically meaningful structure, capturing a distributed representation that mirrors known cosmological parameters and their degeneracies. Moreover, it supports highly effective unsupervised discrimination among cosmological models, achieving performance competitive with supervised approaches. Overall, this physics-informed CVAE enables anomaly detection beyond CDM and points to physically meaningful directions for refinement.

Paper Structure

This paper contains 11 sections, 7 equations, 13 figures, 3 tables.

Figures (13)

  • Figure 1: Architecture of the CVAE with two encoders. Encoder 1 compresses the CMB power spectra into a latent representation $\boldsymbol z_{1}$ of dimension $L$. Encoder 2 compresses the corresponding cosmological parameters into another latent representation $\boldsymbol z_{2}$ and employs the SKL distance to encourage their alignment. The decoder reconstructs the CMB power spectra from samples drawn in the shared latent space $\boldsymbol Z$.
  • Figure 2: Similarity between CVAE/VAE reconstructions and simulated power spectra for various flat $\Lambda$CDM configurations and parameter dimensionalities. The black vertical lines indicate the upper and lower bounds of the training Dataset. All other parameters are set to the corresponding values at Dataset A.
  • Figure 3: Similarity between CVAE/VAE reconstructions and simulated power spectra for various $\Lambda$CDM extensions' parameter dimensionalities. The six $\Lambda$CDM parameters are fixed to the Planck best-fit values, while any other additional extended parameters are set to their corresponding values in Dataset A.
  • Figure 4: Reconstructions of the CMB $D_\ell^{TT}$, $D_\ell^{EE}$, and $D_\ell^{TE}$ power spectra by the optimal CVAE under four cosmological models, together with the absolute error relative to CAMB, $|\Delta D_\ell|$. The panels show, from left to right: flat $\Lambda$CDM; the CPL model with DESI DR2 best-fit parameters DESI:2025zgx for ($w_0$, $w_a$); the Hu--Sawicki $f_{R0}=10^{-3}$ simulated by MGCAMBHu:2007nkZucca:2019xhg; and the non-flat $\Lambda$CDM model ($\Omega_k=-0.03$). Gray curves indicate the Planck $3\sigma$ uncertainty range, and error bars denote the $3\sigma$ uncertainties of the CVAE's reconstructions.
  • Figure 5: Reconstruction of the Planck observed power spectra by the optimal CVAE and the deviations from the best-fit observational values. Gray error bars denote the $3\sigma$ ranges of the Planck data; error bars on the reconstructions denote their $3\sigma$ uncertainties.
  • ...and 8 more figures