Vision Transformers for Cosmological Fields: Application to Weak Lensing Mass Maps

Jash Kakadia; Shubh Agrawal; Kunhao Zhong; Bhuvnesh Jain

Vision Transformers for Cosmological Fields: Application to Weak Lensing Mass Maps

Jash Kakadia, Shubh Agrawal, Kunhao Zhong, Bhuvnesh Jain

TL;DR

This work assesses whether attention-based vision models can extract non-Gaussian information from weak-lensing mass maps to constrain $Ω_m$ and $S_8$ using simulation-based inference (SBI). It compares Vision Transformers (ViT) and Swin Transformers against CNN baselines on convergence maps from DarkGridV1, incorporating tomographic channels and pre-training on synthetic data. The Swin Transformer generally outperforms vanilla ViT, particularly with limited training data, yet the cosmological Figure of Merit under realistic shape noise remains comparable to CNNs, with pre-training substantially boosting transformer performance. The results suggest transformers offer interpretability advantages and potential gains with more data or improved pre-training, but do not yet surpass CNNs in this realistic setting for cosmological parameter inference.

Abstract

Weak gravitational lensing is a powerful probe of the universe's growth history. While traditional two-point statistics capture only the Gaussian features of the convergence field, deep learning methods such as convolutional neural networks (CNNs) have shown promise in extracting non-Gaussian information from small-scale, nonlinear structures. In this work, we evaluate the effectiveness of attention-based architectures, including variants of vision transformers (ViTs) and shifted window (Swin) transformers, in constraining the cosmological parameters $Ω_m$ and $S_8$ from weak lensing mass maps. Using a simulation-based inference (SBI) framework, we compare transformer-based methods to CNNs. We also examine performance scaling with the number of available $N$-body simulations, highlighting the importance of pre-training for transformer architectures. We find that the Swin transformer performs significantly better than vanilla ViTs, especially with limited training data. Despite their higher representational capacity, the Figure of Merit for cosmology achieved by transformers is comparable to that of CNNs under realistic noise conditions.

Vision Transformers for Cosmological Fields: Application to Weak Lensing Mass Maps

TL;DR

This work assesses whether attention-based vision models can extract non-Gaussian information from weak-lensing mass maps to constrain

and

using simulation-based inference (SBI). It compares Vision Transformers (ViT) and Swin Transformers against CNN baselines on convergence maps from DarkGridV1, incorporating tomographic channels and pre-training on synthetic data. The Swin Transformer generally outperforms vanilla ViT, particularly with limited training data, yet the cosmological Figure of Merit under realistic shape noise remains comparable to CNNs, with pre-training substantially boosting transformer performance. The results suggest transformers offer interpretability advantages and potential gains with more data or improved pre-training, but do not yet surpass CNNs in this realistic setting for cosmological parameter inference.

Abstract

and

from weak lensing mass maps. Using a simulation-based inference (SBI) framework, we compare transformer-based methods to CNNs. We also examine performance scaling with the number of available

-body simulations, highlighting the importance of pre-training for transformer architectures. We find that the Swin transformer performs significantly better than vanilla ViTs, especially with limited training data. Despite their higher representational capacity, the Figure of Merit for cosmology achieved by transformers is comparable to that of CNNs under realistic noise conditions.

Vision Transformers for Cosmological Fields: Application to Weak Lensing Mass Maps

TL;DR

Abstract

Vision Transformers for Cosmological Fields: Application to Weak Lensing Mass Maps

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)