Table of Contents
Fetching ...

How many simulations do we need for simulation-based inference in cosmology?

Anirban Bairagi, Benjamin Wandelt, Francisco Villaescusa-Navarro

TL;DR

The paper tackles the question of how many simulations are required for neural networks to reach information-optimal cosmological parameter inference from non-linear data. By grounding neural training in Fisher information and the Cramér-Rao bound, it shows that the widely used Quijote LH set (2,000 simulations) is insufficient to saturate the information in the non-linear matter power spectrum $P(k)$, and it derives a power-law scaling law for the achievable information gain as a function of training set size. The authors validate this scaling with the large BSQ dataset (32,768 simulations) and demonstrate saturation of information near ~4,000 training runs for $P(k)$, while also extending the analysis to neural posterior estimation using wavelet scattering transforms, which exhibit similar scaling behavior. The work provides practical guidance for planning simulation campaigns and highlights the need for novel training strategies or faster simulations to achieve near-optimal inference in cosmology’s non-linear regime. ${F= abla_{ heta}oldsymbol{ u}^T C^{-1} abla_{ heta}oldsymbol{ u}}$, ${L = L_{CR} + cN^{-\\alpha}}$ with ${\alpha \approx 0.39}$, and saturation near ${N \approx 4,000}$ are central quantitative findings.

Abstract

How many simulations do we need to train machine learning methods to extract information available from summary statistics of the cosmological density field? Neural methods have shown the potential to extract non-linear information available from cosmological data. Success depends critically on having sufficient simulations for training the networks and appropriate network architectures. In the first detailed convergence study of neural network training for cosmological inference, we show that currently available simulation suites, such as the Quijote Latin Hypercube(LH) with 2000 simulations, do not provide sufficient training data for a generic neural network to reach the optimal regime, even for the dark matter power spectrum, and in an idealized case. We discover an empirical neural scaling law that predicts how much information a neural network can extract from a highly informative summary statistic, the dark matter power spectrum, as a function of the number of simulations used to train the network, for a wide range of architectures and hyperparameters. We combine this result with the Cramer-Rao information bound to forecast the number of training simulations needed for near-optimal information extraction. To verify our method we created the largest publicly released simulation data set in cosmology, the Big Sobol Sequence(BSQ), consisting of 32,768 $Λ$CDM n-body simulations uniformly covering the $Λ$CDM parameter space. Our method enables efficient planning of simulation campaigns for machine learning applications in cosmology, while the BSQ dataset provides an unprecedented resource for studying the convergence behavior of neural networks in cosmological parameter inference. Our results suggest that new large simulation suites or new training approaches will be necessary to achieve information-optimal parameter inference from non-linear simulations.

How many simulations do we need for simulation-based inference in cosmology?

TL;DR

The paper tackles the question of how many simulations are required for neural networks to reach information-optimal cosmological parameter inference from non-linear data. By grounding neural training in Fisher information and the Cramér-Rao bound, it shows that the widely used Quijote LH set (2,000 simulations) is insufficient to saturate the information in the non-linear matter power spectrum , and it derives a power-law scaling law for the achievable information gain as a function of training set size. The authors validate this scaling with the large BSQ dataset (32,768 simulations) and demonstrate saturation of information near ~4,000 training runs for , while also extending the analysis to neural posterior estimation using wavelet scattering transforms, which exhibit similar scaling behavior. The work provides practical guidance for planning simulation campaigns and highlights the need for novel training strategies or faster simulations to achieve near-optimal inference in cosmology’s non-linear regime. , with , and saturation near are central quantitative findings.

Abstract

How many simulations do we need to train machine learning methods to extract information available from summary statistics of the cosmological density field? Neural methods have shown the potential to extract non-linear information available from cosmological data. Success depends critically on having sufficient simulations for training the networks and appropriate network architectures. In the first detailed convergence study of neural network training for cosmological inference, we show that currently available simulation suites, such as the Quijote Latin Hypercube(LH) with 2000 simulations, do not provide sufficient training data for a generic neural network to reach the optimal regime, even for the dark matter power spectrum, and in an idealized case. We discover an empirical neural scaling law that predicts how much information a neural network can extract from a highly informative summary statistic, the dark matter power spectrum, as a function of the number of simulations used to train the network, for a wide range of architectures and hyperparameters. We combine this result with the Cramer-Rao information bound to forecast the number of training simulations needed for near-optimal information extraction. To verify our method we created the largest publicly released simulation data set in cosmology, the Big Sobol Sequence(BSQ), consisting of 32,768 CDM n-body simulations uniformly covering the CDM parameter space. Our method enables efficient planning of simulation campaigns for machine learning applications in cosmology, while the BSQ dataset provides an unprecedented resource for studying the convergence behavior of neural networks in cosmological parameter inference. Our results suggest that new large simulation suites or new training approaches will be necessary to achieve information-optimal parameter inference from non-linear simulations.

Paper Structure

This paper contains 10 sections, 10 equations, 12 figures, 1 table.

Figures (12)

  • Figure 1: An illustration of PowerSpectraNet architecture. This infers cosmological parameters $\hat{\theta}:\{\Omega_m, \Omega_b, h, n_s, \sigma_8\}$ from the Power Spectrum $P(k)$. The dimension of each layer is mentioned at the bottom of the corresponding layer.
  • Figure 2: Loss (Eq. \ref{['logloss']}) on held out test data as a function of number of Quijote LH simulations. The loss asymptotes to the Cramér-Rao (C-R) bound $\mathcal{L}_{CR}$ by power law decay $\propto N^{-0.39}$ (cf. Eq. \ref{['scaling']}). We mark on the abcissa the asymptotic regime, defined as where the test loss becomes $e$ times the Cramèr-Rao bound. Based on our initial training set of 1500 we predict that the information extracted by the neural summary is not yet optimal and that several thousand simulations are needed to be in the asymptotic regime. This prediction is verified in Figure \ref{['corner_BSQ']}.
  • Figure 3: Fisher information in neural summaries vs optimal $P(k)$. NNs trained on 500, 1250, and 1500 LH simulations show increasingly tighter parameter constraints but do not saturate the information bound of an optimal estimator based on the $P(k)$.
  • Figure 4: Test loss \ref{['logloss']} as a function of number of BSQ simulations. The loss follows a simple power law scaling law across 2 orders of magnitude with a power law slope similar to the LH simulations (shown for comparison). The combined test loss for all parameters does not reach the optimal Fisher information computed at the fiducial point. The test loss is computed over the full prior range in parameter space rather than at a single point in the center of the training data. The loss of BSQ simulations are slightly higher than the LH simulations, but Figure \ref{['corner_BSQ']} shows that the NN achieves asymptotic optimality at the fiducial point at 4000 simulations.
  • Figure 5: Fisher information in neural summaries vs optimal $P(k)$. Information starts saturating near 4000 BSQ simulations and the corresponding neural summary is nearly as informative as the optimal $P(k)$.
  • ...and 7 more figures