Table of Contents
Fetching ...

Large Language Models -- the Future of Fundamental Physics?

Caroline Heneka, Florian Nieser, Ayodele Ore, Tilman Plehn, Daniel Schiller

TL;DR

This work shows how the Qwen2.5 LLM can be used to analyze and generate SKA data, specifically 3D maps of the cosmological large-scale structure for a large part of the observable Universe.

Abstract

For many fundamental physics applications, transformers, as the state of the art in learning complex correlations, benefit from pretraining on quasi-out-of-domain data. The obvious question is whether we can exploit Large Language Models, requiring proper out-of-domain transfer learning. We show how the Qwen2.5 LLM can be used to analyze and generate SKA data, specifically 3D maps of the cosmological large-scale structure for a large part of the observable Universe. We combine the LLM with connector networks and show, for cosmological parameter regression and lightcone generation, that this Lightcone LLM (L3M) with Qwen2.5 weights outperforms standard initialization and compares favorably with dedicated networks of matching size.

Large Language Models -- the Future of Fundamental Physics?

TL;DR

This work shows how the Qwen2.5 LLM can be used to analyze and generate SKA data, specifically 3D maps of the cosmological large-scale structure for a large part of the observable Universe.

Abstract

For many fundamental physics applications, transformers, as the state of the art in learning complex correlations, benefit from pretraining on quasi-out-of-domain data. The obvious question is whether we can exploit Large Language Models, requiring proper out-of-domain transfer learning. We show how the Qwen2.5 LLM can be used to analyze and generate SKA data, specifically 3D maps of the cosmological large-scale structure for a large part of the observable Universe. We combine the LLM with connector networks and show, for cosmological parameter regression and lightcone generation, that this Lightcone LLM (L3M) with Qwen2.5 weights outperforms standard initialization and compares favorably with dedicated networks of matching size.

Paper Structure

This paper contains 32 sections, 36 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Qwen2.5 architecture, separating the embedding layers from the LLM backbone.
  • Figure 2: L3M setup connecting numerical tokens with the LLM backbone transformer.
  • Figure 3: Global brightness temperature signal for 10 different lightcones and their corresponding downsampled distributions.
  • Figure 4: Mean validation loss with a $1\sigma$-band determined from 8 runs, grouped by prompting template (upper) and backbone initialization (lower). For the reference networks, only the best validation loss is shown.
  • Figure 5: Mean validation loss with a $1\sigma$-band for randomly initialized backbone weights and the minimalist chat template. The number of hidden layers of the backbone network is varied. The right panel shows the best validation losses, including the small reference network.
  • ...and 5 more figures