Preliminary Report on Mantis Shrimp: a Multi-Survey Computer Vision Photometric Redshift Model
Andrew Engel, Gautham Narayan, Nell Byler
TL;DR
This work tackles photometric redshift estimation in the era of large, multi-survey astronomical data by proposing Mantis Shrimp, a multi-modal CNN that ingests nine-band images spanning GALEX UV, Pan-STARRS optical, and UnWISE IR data. The authors cast redshift estimation as a $C=200$-class classification with $\delta c=0.005$ over $z\in[0,1]$, augment the network with line-of-sight dust extinction, and train an adapted ResNet50 for 85 epochs on a representative subset of $N_{ ext{eff}}=2.5\times10^5$ from a total sample of $N=4.2\times10^6$, using luptitude-like scaling and a single fused input tensor. They apply SHAP-based interpretability to quantify each band’s contribution (MM-SHAP) and assess calibration with PIT, linking modality importance to physical features in the spectral energy distribution. While the model currently falls short of the best optical+IR benchmarks, it demonstrates the feasibility of multi-survey image fusion for photo-z, offers physics-grounded insights into band prioritization across redshift, and paves the way for data-efficient training and future fusion strategies. The work thus provides a practical pathway to leveraging heterogeneous astronomical data for scalable, interpretable photometric redshift estimation with potential broad impact on upcoming survey analyses.
Abstract
The availability of large, public, multi-modal astronomical datasets presents an opportunity to execute novel research that straddles the line between science of AI and science of astronomy. Photometric redshift estimation is a well-established subfield of astronomy. Prior works show that computer vision models typically outperform catalog-based models, but these models face additional complexities when incorporating images from more than one instrument or sensor. In this report, we detail our progress creating Mantis Shrimp, a multi-survey computer vision model for photometric redshift estimation that fuses ultra-violet (GALEX), optical (PanSTARRS), and infrared (UnWISE) imagery. We use deep learning interpretability diagnostics to measure how the model leverages information from the different inputs. We reason about the behavior of the CNNs from the interpretability metrics, specifically framing the result in terms of physically-grounded knowledge of galaxy properties.
