Table of Contents
Fetching ...

Extracting Cosmological Information from Lightcone Data: A Comparison of CNNs and Summary-Statistic-Based Approaches

Min Zhiwei, Xiao Xu, Jiang Zhujun, Xiao Liang, Yin Fenfen, Ding Jiacheng, Miao Haitao, Chen Shupei, Lin Qiufan, Wang yang, Zhang Le, Li XiaoDong

Abstract

Lightcone observations are the natural data format of galaxy surveys, but their evolving geometry breaks the translational symmetry assumed by standard convolutional neural networks (CNNs). In particular, applying CNNs to 3D gridded lightcone data implicitly treats the line-of-sight direction as translationally invariant, despite encoding cosmic time evolution. We propose a simple alternative (CNN+2D) that divides the lightcone into redshift slices, projects each onto a HEALPix sphere, and analyzes them with a 2D CNN. Using \texttt{AbacusSummit} halo lightcone mocks ($0.3<z<0.8$, $40^\circ\times40^\circ$), we compare this approach with fully connected networks (FC) applied to different summary statistics, including spherical harmonic coefficients ($a_{\ell m}$), wavelet scattering transform (WST) coefficients, and the angular two-point correlation function (2PCF), along with standard 2PCF likelihood and Fisher forecasts. We find that multiple statistics beyond CNNs can achieve competitive performance: FC networks combined with $a_{\ell m}$ and especially WST significantly outperform 2PCF-based methods, with FC+WST yielding the best overall parameter constraints across cosmologies. Meanwhile, for a fiducial cosmology with multiple realizations, the CNN+2D approach achieves the smallest statistical uncertainties. These results demonstrate that both learned features and carefully constructed summary statistics can effectively extract cosmological information from lightcone data, providing flexible and robust analysis strategies for upcoming surveys such as DESI, Euclid, and CSST.

Extracting Cosmological Information from Lightcone Data: A Comparison of CNNs and Summary-Statistic-Based Approaches

Abstract

Lightcone observations are the natural data format of galaxy surveys, but their evolving geometry breaks the translational symmetry assumed by standard convolutional neural networks (CNNs). In particular, applying CNNs to 3D gridded lightcone data implicitly treats the line-of-sight direction as translationally invariant, despite encoding cosmic time evolution. We propose a simple alternative (CNN+2D) that divides the lightcone into redshift slices, projects each onto a HEALPix sphere, and analyzes them with a 2D CNN. Using \texttt{AbacusSummit} halo lightcone mocks (, ), we compare this approach with fully connected networks (FC) applied to different summary statistics, including spherical harmonic coefficients (), wavelet scattering transform (WST) coefficients, and the angular two-point correlation function (2PCF), along with standard 2PCF likelihood and Fisher forecasts. We find that multiple statistics beyond CNNs can achieve competitive performance: FC networks combined with and especially WST significantly outperform 2PCF-based methods, with FC+WST yielding the best overall parameter constraints across cosmologies. Meanwhile, for a fiducial cosmology with multiple realizations, the CNN+2D approach achieves the smallest statistical uncertainties. These results demonstrate that both learned features and carefully constructed summary statistics can effectively extract cosmological information from lightcone data, providing flexible and robust analysis strategies for upcoming surveys such as DESI, Euclid, and CSST.

Paper Structure

This paper contains 21 sections, 16 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: HEALPix maps and the corresponding projected 2D density fields for two cosmological models (M1 and M2) in the redshift bin $0.475 \le z \le 0.538$. Left: full-sky simulation covering RA and DEC from $0^\circ$ to $90^\circ$. Right: zoomed-in region with RA and DEC both from $0^\circ$ to $40^\circ$. The color bar indicates the overdensity.
  • Figure 2: Distribution of cosmological parameters for the 52 simulations from AbacusSummit base c130-181 ph000. The red square marks the fiducial base cosmology (c000). Blue points indicate the 32 training samples, and green points represent the 20 test samples.
  • Figure 3: 2D CNN architecture for HEALPix-projected density fields ($9 \times 512^2$, 9 redshift bins). The network contains four convolutional blocks (convolution + batch normalization + $2\times2$ max-pooling), which progressively reduce the spatial resolution by a factor of two, followed by an adaptive pooling layer and a FC regressor for cosmological parameter estimation. For models using summary statistics (2PCF, $a_{\ell m}$, and WST), the convolutional feature-extraction stage is removed and only the same FC regressor is retained. The input dimension $n$ of the FC block is adjusted according to match the dimensionality of each statistic.
  • Figure 4: Training and test loss curves for the four models. The blue line represents the training loss, and the orange line represents the test loss.
  • Figure 5: Comparison between the true and predicted values of the cosmological parameters $\Omega_b$, $\Omega_m$, $h$, $A_s$, $n_s$, and $\sigma_8$ obtained using four different methods. The black dashed line indicates the line of perfect agreement. For each panel, the corresponding $R^2$ and RMSE values are displayed in the upper-left corner.
  • ...and 5 more figures