Table of Contents
Fetching ...

Meta-learning for cosmological emulation: Rapid adaptation to new lensing kernels

Charlie MacMahon-Gellér, C. Danielle Leonard, Philip Bull, Markus Michael Rau

TL;DR

This work investigates the Model-Agnostic Meta-Learning algorithm (MAML) for training a cosmological emulator, and observes that within an MCMC analysis, the MAML emulator is able to better reproduce the fully-theoretical posterior.

Abstract

Theoretical computation of cosmological observables is an intensive process, restricting the speed at which cosmological data can be analysed and cosmological models constrained, and therefore limiting research access to those with high performance computing infrastructure. Whilst the use of machine learning to emulate these computations has been studied, most existing emulators are specialised and not suitable for emulating a wide range of observables with changing physical models. Here, we investigate the Model-Agnostic Meta-Learning algorithm (MAML) for training a cosmological emulator. MAML attempts to train a set of network parameters for rapid fine-tuning to new tasks within some distribution of tasks. Specifically, we consider a simple case where the galaxy sample changes, resulting in a different redshift distribution and lensing kernel. Using MAML, we train a cosmic-shear angular power spectrum emulator for rapid adaptation to new redshift distributions with only $O(100)$ fine-tuning samples, whilst not requiring any parametrisation of the redshift distributions. We compare the performance of the MAML emulator to two standard emulators, one pre-trained on a single redshift distribution and the other with no pre-training, both in terms of accuracy on test data, and the constraints produced when using the emulators for cosmological inference. We observe that within an MCMC analysis, the MAML emulator is able to better reproduce the fully-theoretical posterior, achieving a Battacharrya distance from the fully-theoretical posterior in the $S_8$ -- $Ω_m$ plane of 0.008, compared to 0.038 from the single-task pre-trained emulator and 0.243 for the emulator with no pre-training.

Meta-learning for cosmological emulation: Rapid adaptation to new lensing kernels

TL;DR

This work investigates the Model-Agnostic Meta-Learning algorithm (MAML) for training a cosmological emulator, and observes that within an MCMC analysis, the MAML emulator is able to better reproduce the fully-theoretical posterior.

Abstract

Theoretical computation of cosmological observables is an intensive process, restricting the speed at which cosmological data can be analysed and cosmological models constrained, and therefore limiting research access to those with high performance computing infrastructure. Whilst the use of machine learning to emulate these computations has been studied, most existing emulators are specialised and not suitable for emulating a wide range of observables with changing physical models. Here, we investigate the Model-Agnostic Meta-Learning algorithm (MAML) for training a cosmological emulator. MAML attempts to train a set of network parameters for rapid fine-tuning to new tasks within some distribution of tasks. Specifically, we consider a simple case where the galaxy sample changes, resulting in a different redshift distribution and lensing kernel. Using MAML, we train a cosmic-shear angular power spectrum emulator for rapid adaptation to new redshift distributions with only fine-tuning samples, whilst not requiring any parametrisation of the redshift distributions. We compare the performance of the MAML emulator to two standard emulators, one pre-trained on a single redshift distribution and the other with no pre-training, both in terms of accuracy on test data, and the constraints produced when using the emulators for cosmological inference. We observe that within an MCMC analysis, the MAML emulator is able to better reproduce the fully-theoretical posterior, achieving a Battacharrya distance from the fully-theoretical posterior in the -- plane of 0.008, compared to 0.038 from the single-task pre-trained emulator and 0.243 for the emulator with no pre-training.

Paper Structure

This paper contains 22 sections, 7 equations, 9 figures, 3 tables, 2 algorithms.

Figures (9)

  • Figure 1: A flowchart of the MAML training process used in this work. Different redshift distributions ($N(z)$) constitute different tasks. At each step in the outer loop, a batch of $N(z)$ are sampled, with APS ($C_\ell$) generated for a range of cosmological parameters for each $N(z)$. The $C_\ell$ are split into support ($\mathcal{D}_{sup}$) and query ($\mathcal{D}_{qry}$) sets. The inner loop trains the NN on $\mathcal{D}_{sup}$ to obtain task-specific parameters, $\theta$, which are then tested with $\mathcal{D}_{qry}$. Once this process has completed for each $N(z)$ in the batch, the query losses ($\mathcal{L}_i$) are averaged together and used to update the meta-parameters of the model, $\Phi$, which will go on to serve as the initialisation for $\theta_i$ in the next batch.
  • Figure 2: Diagram illustrating the architecture of the neural network used in this work. We take a hybrid approach, combining fully-connected linear layers with convolutional layers in order to capture correlated values in the power spectra data vector as spatial correlations in 2D. Increasing levels of dilation are applied in each layer of the CNN subnet to capture correlations across different scales. The final convolutional outputs are flattened and fed as inputs to a fully connected output layer, which produces the emulated data vector. Images used in the convolutional layers are real activations from the network.
  • Figure 3: Mean absolute percentage error (left) and failure rate (right) on the test task, using emulators trained with different numbers of tasks, shots and task batch sizes. The colour of each point is defined by the number of shots, while the marker style represents different numbers of tasks. The size of each marker is determined based upon to the total number of samples required (number of tasks multiplied by number of shots). Groups of points sharing the same task batch size are shown in separate shaded blocks, with the task batch size denoted on the horizontal axis. The point representing the parameters used in this paper is indicated by the red box.
  • Figure 4: Mean absolute percentage error (left) and failure rate (right) on the test task, for the single-task (orange) and MAML (blue) trained emulators. The solid lines represent the mean value of each metric over $20$ unique random seeds, and the shaded regions illustrate the $1\sigma$ standard deviation. The insets show a zoomed in view of the results between 100 and 1000 samples. We can see that the MAML emulator achieves lower mean absolute percentage error for all numbers of fine-tuning samples, though both have similar failure rates. Importantly however, the MAML emulator appears to show less deviation in performance with different random seeds, suggesting it is more robust to changing cuDNN optimisations and fine-tuning samples.
  • Figure 5: Ratio of mean absolute percentage error (left) and failure rate (right) on the test task for the fresh emulator with respect to the MAML emulator. The horizontal axis shows increasing numbers of training samples used to train the fresh emulator, while the number of samples given to fine-tune the MAML emulator remains fixed at $100$. We see that the fresh emulator starts to match or exceed the performance of the MAML emulator once more than about $8,000$ samples are provided for training. The solid lines show the average ratio for $20$ different selections of training and test data, while the shaded region indicates the $1\sigma$ deviation in the ratios across these $20$ selections.
  • ...and 4 more figures