Masked Autoencoder Pretraining on Strong-Lensing Images for Joint Dark-Matter Model Classification and Super-Resolution
Achmad Ardani Prasha, Clavino Ourizqi Rachmadi, Muhamad Fauzan Ibnu Syahlan, Naufal Rahfi Anugerah, Nanda Garin Raditya, Putri Amelia, Sabrina Laila Mutiara, Hilman Syachr Ramadhan
TL;DR
This work introduces a masked autoencoder pretraining framework for strong-lensing images to learn a single Vision Transformer encoder that can be fine-tuned for both three-way dark-matter model classification and lensed-image super-resolution. By exploring mask ratios up to 90%, the authors show that MAE-pretrained encoders can match or exceed scratch-trained baselines for classification (AUC up to 0.968, accuracy 88.65%), while offering modest SR gains (SSIM ≈ 0.961, PSNR ≈ 33 dB). A mask-ratio ablation reveals a trade-off: higher masking improves semantic classification at the expense of reconstruction fidelity. The results demonstrate the potential of physics-rich, self-supervised representations as reusable encoders for multi-task strong-lensing analysis and point to future work on pretraining across all DM classes and domain adaptation to real data.
Abstract
Strong gravitational lensing can reveal the influence of dark-matter substructure in galaxies, but analyzing these effects from noisy, low-resolution images poses a significant challenge. In this work, we propose a masked autoencoder (MAE) pretraining strategy on simulated strong-lensing images from the DeepLense ML4SCI benchmark to learn generalizable representations for two downstream tasks: (i) classifying the underlying dark matter model (cold dark matter, axion-like, or no substructure) and (ii) enhancing low-resolution lensed images via super-resolution. We pretrain a Vision Transformer encoder using a masked image modeling objective, then fine-tune the encoder separately for each task. Our results show that MAE pretraining, when combined with appropriate mask ratio tuning, yields a shared encoder that matches or exceeds a ViT trained from scratch. Specifically, at a 90% mask ratio, the fine-tuned classifier achieves macro AUC of 0.968 and accuracy of 88.65%, compared to the scratch baseline (AUC 0.957, accuracy 82.46%). For super-resolution (16x16 to 64x64), the MAE-pretrained model reconstructs images with PSNR ~33 dB and SSIM 0.961, modestly improving over scratch training. We ablate the MAE mask ratio, revealing a consistent trade-off: higher mask ratios improve classification but slightly degrade reconstruction fidelity. Our findings demonstrate that MAE pretraining on physics-rich simulations provides a flexible, reusable encoder for multiple strong-lensing analysis tasks.
