Table of Contents
Fetching ...

Masked Autoencoder Pretraining on Strong-Lensing Images for Joint Dark-Matter Model Classification and Super-Resolution

Achmad Ardani Prasha, Clavino Ourizqi Rachmadi, Muhamad Fauzan Ibnu Syahlan, Naufal Rahfi Anugerah, Nanda Garin Raditya, Putri Amelia, Sabrina Laila Mutiara, Hilman Syachr Ramadhan

TL;DR

This work introduces a masked autoencoder pretraining framework for strong-lensing images to learn a single Vision Transformer encoder that can be fine-tuned for both three-way dark-matter model classification and lensed-image super-resolution. By exploring mask ratios up to 90%, the authors show that MAE-pretrained encoders can match or exceed scratch-trained baselines for classification (AUC up to 0.968, accuracy 88.65%), while offering modest SR gains (SSIM ≈ 0.961, PSNR ≈ 33 dB). A mask-ratio ablation reveals a trade-off: higher masking improves semantic classification at the expense of reconstruction fidelity. The results demonstrate the potential of physics-rich, self-supervised representations as reusable encoders for multi-task strong-lensing analysis and point to future work on pretraining across all DM classes and domain adaptation to real data.

Abstract

Strong gravitational lensing can reveal the influence of dark-matter substructure in galaxies, but analyzing these effects from noisy, low-resolution images poses a significant challenge. In this work, we propose a masked autoencoder (MAE) pretraining strategy on simulated strong-lensing images from the DeepLense ML4SCI benchmark to learn generalizable representations for two downstream tasks: (i) classifying the underlying dark matter model (cold dark matter, axion-like, or no substructure) and (ii) enhancing low-resolution lensed images via super-resolution. We pretrain a Vision Transformer encoder using a masked image modeling objective, then fine-tune the encoder separately for each task. Our results show that MAE pretraining, when combined with appropriate mask ratio tuning, yields a shared encoder that matches or exceeds a ViT trained from scratch. Specifically, at a 90% mask ratio, the fine-tuned classifier achieves macro AUC of 0.968 and accuracy of 88.65%, compared to the scratch baseline (AUC 0.957, accuracy 82.46%). For super-resolution (16x16 to 64x64), the MAE-pretrained model reconstructs images with PSNR ~33 dB and SSIM 0.961, modestly improving over scratch training. We ablate the MAE mask ratio, revealing a consistent trade-off: higher mask ratios improve classification but slightly degrade reconstruction fidelity. Our findings demonstrate that MAE pretraining on physics-rich simulations provides a flexible, reusable encoder for multiple strong-lensing analysis tasks.

Masked Autoencoder Pretraining on Strong-Lensing Images for Joint Dark-Matter Model Classification and Super-Resolution

TL;DR

This work introduces a masked autoencoder pretraining framework for strong-lensing images to learn a single Vision Transformer encoder that can be fine-tuned for both three-way dark-matter model classification and lensed-image super-resolution. By exploring mask ratios up to 90%, the authors show that MAE-pretrained encoders can match or exceed scratch-trained baselines for classification (AUC up to 0.968, accuracy 88.65%), while offering modest SR gains (SSIM ≈ 0.961, PSNR ≈ 33 dB). A mask-ratio ablation reveals a trade-off: higher masking improves semantic classification at the expense of reconstruction fidelity. The results demonstrate the potential of physics-rich, self-supervised representations as reusable encoders for multi-task strong-lensing analysis and point to future work on pretraining across all DM classes and domain adaptation to real data.

Abstract

Strong gravitational lensing can reveal the influence of dark-matter substructure in galaxies, but analyzing these effects from noisy, low-resolution images poses a significant challenge. In this work, we propose a masked autoencoder (MAE) pretraining strategy on simulated strong-lensing images from the DeepLense ML4SCI benchmark to learn generalizable representations for two downstream tasks: (i) classifying the underlying dark matter model (cold dark matter, axion-like, or no substructure) and (ii) enhancing low-resolution lensed images via super-resolution. We pretrain a Vision Transformer encoder using a masked image modeling objective, then fine-tune the encoder separately for each task. Our results show that MAE pretraining, when combined with appropriate mask ratio tuning, yields a shared encoder that matches or exceeds a ViT trained from scratch. Specifically, at a 90% mask ratio, the fine-tuned classifier achieves macro AUC of 0.968 and accuracy of 88.65%, compared to the scratch baseline (AUC 0.957, accuracy 82.46%). For super-resolution (16x16 to 64x64), the MAE-pretrained model reconstructs images with PSNR ~33 dB and SSIM 0.961, modestly improving over scratch training. We ablate the MAE mask ratio, revealing a consistent trade-off: higher mask ratios improve classification but slightly degrade reconstruction fidelity. Our findings demonstrate that MAE pretraining on physics-rich simulations provides a flexible, reusable encoder for multiple strong-lensing analysis tasks.

Paper Structure

This paper contains 20 sections, 3 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Overview of our masked autoencoder (MAE) pretraining and task-specific fine-tuning framework. Left: During MAE pretraining, we mask a fraction (e.g., 75%) of the input image patches and train a Vision Transformer encoder plus a lightweight decoder to reconstruct the missing patches. Right: We then fine-tune the pretrained encoder separately for two tasks: (a) classification of dark-matter models using a linear head on the CLS token, and (b) super-resolution using a convolutional decoder on patch tokens.
  • Figure 2: Confusion matrix for 3-class dark-matter model classification using the MAE-pretrained ViT (baseline configuration: fine-tuned, 75% mask ratio). The classifier achieves good separation between no_sub and the two substructure classes, with some confusion between cdm and axion due to their visually similar substructure signatures.
  • Figure 3: Receiver operating characteristic (ROC) curves for 3-class dark-matter model classification using one-vs-rest evaluation. Each curve shows the true positive rate versus false positive rate for distinguishing one class from the other two. Per-class AUC values are indicated in the legend.
  • Figure 4: Reliability diagram (calibration curve) for the MAE-pretrained classifier (baseline configuration). A perfectly calibrated model would have all points along the diagonal dashed line; deviations above (below) the diagonal indicate under-confidence (over-confidence) in predictions.
  • Figure 5: Super-resolution comparison on strong-lensing images from the no_sub scenario used in Task VI B. Each row displays three panels: (left) low-resolution input ($16\times16$, bicubic-upsampled for visualization), (middle) super-resolved prediction from the MAE-pretrained model ($64\times64$), and (right) ground-truth high-resolution image ($64\times64$). The MAE-pretrained model preserves fine arc details and structural features more faithfully than the scratch baseline.
  • ...and 2 more figures