Table of Contents
Fetching ...

OJALÁ: Optimizing J-PAS Astronomy for Large-scale Analysis. A foundation model for the SED of galaxies, QSOs and stars

G. Martínez-Solaeche, R. M. González Delgado, R. García-Benito, A. Hernán-Caballero, I. Pérez-Ràfols, L. A. Díaz-García, L. Raul Abramo, J. E. Rodríguez-Martín, A. M. Conrado, I. Breda, H. Domínguez Sánchez, I. Márquez, M. Pieri, D. López-Cano, V. M. Placco, L. Nakazono, A. del Pino, V. Marra, J. Alcaniz, N. Benitez, S. Bonoli, S. Carneiro, A. J. Cenarro, D. Cristóbal-Hornillos, S. Daflon, R. A. Dupke, A. Ederoclite, C. Hernández-Monteagudo, J. Liu, C. López-Sanjuan, A. Marín-Franch, C. Mendes de Oliveira, M. Moles, F. Roig, L. Sodré, K. Taylor, J. Varela, H. Vázquez Ramió, J. M. Vílchez, J. Zaragoza-Cardiel

Abstract

The advent of large-scale surveys requires efficient ML techniques to exploit the information of massive datasets. We present OJALA, a transformer-based autoregressive foundation model designed to simultaneously classify astronomical objects and infer their physical parameters using 54 narrow bands from J-PAS, combined with broad bands from the DESI Legacy Imaging Surveys and WISE. The model is trained on $\sim20$ million synthetic SEDs generated from DESI DR1 spectra. We validate OJALA using a cross-matched sample of $\sim121,000$ objects between J-PAS and DESI. The model achieves a weighted F1-score of approximately 0.9 for spectral classification (stars, galaxies, and QSOs) at $i < 21$. For galaxies, we recover photo-z with a precision of $σ_{\rm NMAD} < 0.01$, while for QSOs, the precision improves significantly at $z > 1.5$, reaching $σ_{\rm NMAD} \approx 0.006$ at $z \approx 3.5$. We demonstrate robust estimation of physical properties for galaxies, recovering stellar masses and SFR with a scatter of approximately 0.11 dex and 0.22 dex, respectively. Furthermore, the model accurately predicts EWs for major optical emission lines, allowing for the derivation of extinction-corrected H$α$ luminosities with a scatter of 0.29 dex. OJALA successfully reproduces the BPT and WHAN diagnostic diagrams, classifying SF, AGN, and passive galaxies with F1-scores typically ranging from 70% to 90% depending on the diagnostic class. For stars, the model reliably infers effective temperature and metallicity, though surface gravity remains challenging. Finally, we show the modularity of the architecture by fine-tuning the pre-trained embeddings to predict BH masses, a property not included in the primary training, recovering spectroscopic virial estimates with a precision of approximately 0.5 dex. We release the code, model weights, and a comprehensive VAC for the J-PAS EDR.

OJALÁ: Optimizing J-PAS Astronomy for Large-scale Analysis. A foundation model for the SED of galaxies, QSOs and stars

Abstract

The advent of large-scale surveys requires efficient ML techniques to exploit the information of massive datasets. We present OJALA, a transformer-based autoregressive foundation model designed to simultaneously classify astronomical objects and infer their physical parameters using 54 narrow bands from J-PAS, combined with broad bands from the DESI Legacy Imaging Surveys and WISE. The model is trained on million synthetic SEDs generated from DESI DR1 spectra. We validate OJALA using a cross-matched sample of objects between J-PAS and DESI. The model achieves a weighted F1-score of approximately 0.9 for spectral classification (stars, galaxies, and QSOs) at . For galaxies, we recover photo-z with a precision of , while for QSOs, the precision improves significantly at , reaching at . We demonstrate robust estimation of physical properties for galaxies, recovering stellar masses and SFR with a scatter of approximately 0.11 dex and 0.22 dex, respectively. Furthermore, the model accurately predicts EWs for major optical emission lines, allowing for the derivation of extinction-corrected H luminosities with a scatter of 0.29 dex. OJALA successfully reproduces the BPT and WHAN diagnostic diagrams, classifying SF, AGN, and passive galaxies with F1-scores typically ranging from 70% to 90% depending on the diagnostic class. For stars, the model reliably infers effective temperature and metallicity, though surface gravity remains challenging. Finally, we show the modularity of the architecture by fine-tuning the pre-trained embeddings to predict BH masses, a property not included in the primary training, recovering spectroscopic virial estimates with a precision of approximately 0.5 dex. We release the code, model weights, and a comprehensive VAC for the J-PAS EDR.

Paper Structure

This paper contains 30 sections, 16 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: Examples of synthetic J-PAS photometric fluxes from DESI objects (white dots) and their most similar counterparts in the J-PAS dataset (colour dots), identified through a similarity search in the embedding space.
  • Figure 2: Distribution of the $r$-band magnitude and $g-r$ color obtained from the DESI Legacy Imaging Survey for the datasets used in this work. The top panels show the distributions for the DESI DR1 sample separated into galaxies, stars, and QSOs (solid lines), together with the corresponding populations for the J-PAS--DESI cross-match (dashed lines). The bottom panels compare the overall distributions of the DESI DR1 and the J-PAS IDR202406 samples.
  • Figure 3: Classification performance metrics as a function of the $i$-band magnitude. The top panels and the bottom-left panel compare the results obtained from J-PAS data versus DESI mocks for the three main classes: Galaxies (red), stars (blue), and QSOs (green). We show Purity (top-left), Completeness (top-right), and F1-Score (bottom-left). The bottom-right panel displays the global weighted F1-score for different input contexts (see Sect.\ref{['subsec:spectral_classification']}) applied to the real dataset. Metrics are computed in sliding magnitude bins centered every 0.3 mag, using a half-width of 0.3 mag, and are only shown when the corresponding class contains at least 100 objects.
  • Figure 4: Photometric redshift performance for galaxies (red) and QSOs (green) as a function of $i$-band magnitude (left column) and spectroscopic redshift (right column). The panels show, from top to bottom, the Bias, the normalized median absolute deviation ($\sigma_{\text{NMAD}}$), and the Outlier Fraction (defined as $|\Delta z| > 0.15$). Solid lines represent results on real J-PAS data (Test-Real), while dotted lines correspond to the synthetic test set (Test-Synth). For galaxies, we also show the performance of the template-fitting code LePHARE (grey dashed lines).
  • Figure 5: Comparison between DESI spectroscopic parameters and OJALA predictions for stars in the Test-Real with $i < 19$. The panels display: Effective Temperature ($T_{\text{eff}}$), Surface Gravity ($\log g$), Metallicity ([Fe/H]), and Alpha-enhancement ([$\alpha$/Fe]). The points are color-coded by density. The text insets indicate the number of objects ($N$), the Bias, and the scatter ($\sigma_{\text{NMAD}}$) for the real data. Values in parentheses correspond to the performance on Test-Synth.
  • ...and 9 more figures