Table of Contents
Fetching ...

Euclid Quick Data Release (Q1). Active galactic nuclei identification using diffusion-based inpainting of Euclid VIS images

Euclid Collaboration, G. Stevens, S. Fotopoulou, M. N. Bremer, T. Matamoro Zatarain, K. Jahnke, B. Margalef-Bentabol, M. Huertas-Company, M. J. Smith, M. Walmsley, M. Salvato, M. Mezcua, A. Paulino-Afonso, M. Siudek, M. Talia, F. Ricci, W. Roster, N. Aghanim, B. Altieri, S. Andreon, H. Aussel, C. Baccigalupi, M. Baldi, S. Bardelli, P. Battaglia, A. Biviano, A. Bonchi, E. Branchini, M. Brescia, J. Brinchmann, S. Camera, G. Cañas-Herrera, V. Capobianco, C. Carbone, J. Carretero, M. Castellano, G. Castignani, S. Cavuoti, K. C. Chambers, A. Cimatti, C. Colodro-Conde, G. Congedo, C. J. Conselice, L. Conversi, Y. Copin, A. Costille, F. Courbin, H. M. Courtois, M. Cropper, A. Da Silva, H. Degaudenzi, G. De Lucia, C. Dolding, H. Dole, M. Douspis, F. Dubath, X. Dupac, S. Dusini, S. Escoffier, M. Farina, S. Ferriol, K. George, C. Giocoli, B. R. Granett, A. Grazian, F. Grupp, S. V. H. Haugan, I. M. Hook, F. Hormuth, A. Hornstrup, P. Hudelot, M. Jhabvala, E. Keihänen, S. Kermiche, A. Kiessling, M. Kilbinger, B. Kubik, M. Kümmel, H. Kurki-Suonio, Q. Le Boulc'h, A. M. C. Le Brun, D. Le Mignant, P. B. Lilje, V. Lindholm, I. Lloro, G. Mainetti, D. Maino, E. Maiorano, O. Marggraf, M. Martinelli, N. Martinet, F. Marulli, R. Massey, S. Maurogordato, H. J. McCracken, E. Medinaceli, S. Mei, M. Melchior, M. Meneghetti, E. Merlin, G. Meylan, A. Mora, M. Moresco, L. Moscardini, R. Nakajima, C. Neissner, S. -M. Niemi, C. Padilla, S. Paltani, F. Pasian, K. Pedersen, W. J. Percival, V. Pettorino, G. Polenta, M. Poncet, L. A. Popa, L. Pozzetti, F. Raison, R. Rebolo, A. Renzi, J. Rhodes, G. Riccio, E. Romelli, M. Roncarelli, R. Saglia, A. G. Sánchez, D. Sapone, J. A. Schewtschenko, M. Schirmer, P. Schneider, T. Schrabback, A. Secroun, S. Serrano, P. Simon, C. Sirignano, G. Sirri, J. Skottfelt, L. Stanco, J. Steinwagner, P. Tallada-Crespí, A. N. Taylor, I. Tereno, S. Toft, R. Toledo-Moreo, F. Torradeflot, I. Tutusaus, L. Valenziano, J. Valiviita, T. Vassallo, G. Verdoes Kleijn, A. Veropalumbo, Y. Wang, J. Weller, A. Zacchei, G. Zamorani, F. M. Zerbi, I. A. Zinchenko, E. Zucca, V. Allevato, M. Ballardini, M. Bolzonella, E. Bozzo, C. Burigana, R. Cabanac, A. Cappi, J. A. Escartin Vigo, L. Gabarra, W. G. Hartley, J. Martín-Fleitas, S. Matthew, R. B. Metcalf, A. Pezzotta, M. Pöntinen, I. Risso, V. Scottez, M. Sereno, M. Tenti, M. Wiesmann, Y. Akrami, S. Alvi, I. T. Andika, S. Anselmi, M. Archidiacono, F. Atrio-Barandela, D. Bertacca, M. Bethermin, L. Bisigello, A. Blanchard, L. Blot, S. Borgani, M. L. Brown, S. Bruton, A. Calabro, F. Caro, T. Castro, F. Cogato, S. Davini, G. Desprez, A. Díaz-Sánchez, J. J. Diaz, S. Di Domizio, J. M. Diego, P. -A. Duc, A. Enia, Y. Fang, A. G. Ferrari, A. Finoguenov, A. Fontana, A. Franco, J. García-Bellido, T. Gasparetto, V. Gautard, E. Gaztanaga, F. Giacomini, F. Gianotti, M. Guidi, C. M. Gutierrez, A. Hall, S. Hemmati, H. Hildebrandt, J. Hjorth, J. J. E. Kajava, Y. Kang, V. Kansal, D. Karagiannis, C. C. Kirkpatrick, S. Kruk, L. Legrand, M. Lembo, F. Lepori, G. Leroy, J. Lesgourgues, L. Leuzzi, T. I. Liaudat, J. Macias-Perez, M. Magliocchetti, F. Mannucci, R. Maoli, C. J. A. P. Martins, L. Maurin, M. Miluzio, P. Monaco, G. Morgante, K. Naidoo, A. Navarro-Alsina, F. Passalacqua, K. Paterson, L. Patrizii, A. Pisani, D. Potter, S. Quai, M. Radovich, P. -F. Rocci, G. Rodighiero, S. Sacquegna, M. Sahlén, D. B. Sanders, E. Sarpa, A. Schneider, M. Schultheis, D. Sciotti, E. Sellentin, F. Shankar, L. C. Smith, K. Tanidis, G. Testera, R. Teyssier, S. Tosi, A. Troja, M. Tucci, C. Valieri, D. Vergani, G. Verza, N. A. Walton

TL;DR

This work presents a diffusion-based inpainting approach to identify AGN and QSOs from single-band Euclid VIS images. By masking the central pixels of galaxy cutouts and inpainting under a learned galaxy prior, reconstruction errors serve as an anomaly score for AGN candidacy, enabling high-recall identification without multi-wavelength data. The method leverages Repaint conditioning, a cosine-beta diffusion schedule, and a normalised hybrid loss to handle the large dynamic range of astronomical images. Across the Euclid Q1 dataset, the diffusion-based classifier demonstrates competitive recall relative to traditional colour and flux-based selectors and can adapt to diverse morphologies, with implications for scalable, survey-wide AGN discovery. The study also discusses training/inference costs, data scaling considerations, and avenues for future improvements, including adaptive schedulers and potential decomposition of AGN components from host galaxies.

Abstract

Light emission from galaxies exhibit diverse brightness profiles, influenced by factors such as galaxy type, structural features and interactions with other galaxies. Elliptical galaxies feature more uniform light distributions, while spiral and irregular galaxies have complex, varied light profiles due to their structural heterogeneity and star-forming activity. In addition, galaxies with an active galactic nucleus (AGN) feature intense, concentrated emission from gas accretion around supermassive black holes, superimposed on regular galactic light, while quasi-stellar objects (QSO) are the extreme case of the AGN emission dominating the galaxy. The challenge of identifying AGN and QSO has been discussed many times in the literature, often requiring multi-wavelength observations. This paper introduces a novel approach to identify AGN and QSO from a single image. Diffusion models have been recently developed in the machine-learning literature to generate realistic-looking images of everyday objects. Utilising the spatial resolving power of the Euclid VIS images, we created a diffusion model trained on one million sources, without using any source pre-selection or labels. The model learns to reconstruct light distributions of normal galaxies, since the population is dominated by them. We condition the prediction of the central light distribution by masking the central few pixels of each source and reconstruct the light according to the diffusion model. We further use this prediction to identify sources that deviate from this profile by examining the reconstruction error of the few central pixels regenerated in each source's core. Our approach, solely using VIS imaging, features high completeness compared to traditional methods of AGN and QSO selection, including optical, near-infrared, mid-infrared, and X-rays.

Euclid Quick Data Release (Q1). Active galactic nuclei identification using diffusion-based inpainting of Euclid VIS images

TL;DR

This work presents a diffusion-based inpainting approach to identify AGN and QSOs from single-band Euclid VIS images. By masking the central pixels of galaxy cutouts and inpainting under a learned galaxy prior, reconstruction errors serve as an anomaly score for AGN candidacy, enabling high-recall identification without multi-wavelength data. The method leverages Repaint conditioning, a cosine-beta diffusion schedule, and a normalised hybrid loss to handle the large dynamic range of astronomical images. Across the Euclid Q1 dataset, the diffusion-based classifier demonstrates competitive recall relative to traditional colour and flux-based selectors and can adapt to diverse morphologies, with implications for scalable, survey-wide AGN discovery. The study also discusses training/inference costs, data scaling considerations, and avenues for future improvements, including adaptive schedulers and potential decomposition of AGN components from host galaxies.

Abstract

Light emission from galaxies exhibit diverse brightness profiles, influenced by factors such as galaxy type, structural features and interactions with other galaxies. Elliptical galaxies feature more uniform light distributions, while spiral and irregular galaxies have complex, varied light profiles due to their structural heterogeneity and star-forming activity. In addition, galaxies with an active galactic nucleus (AGN) feature intense, concentrated emission from gas accretion around supermassive black holes, superimposed on regular galactic light, while quasi-stellar objects (QSO) are the extreme case of the AGN emission dominating the galaxy. The challenge of identifying AGN and QSO has been discussed many times in the literature, often requiring multi-wavelength observations. This paper introduces a novel approach to identify AGN and QSO from a single image. Diffusion models have been recently developed in the machine-learning literature to generate realistic-looking images of everyday objects. Utilising the spatial resolving power of the Euclid VIS images, we created a diffusion model trained on one million sources, without using any source pre-selection or labels. The model learns to reconstruct light distributions of normal galaxies, since the population is dominated by them. We condition the prediction of the central light distribution by masking the central few pixels of each source and reconstruct the light according to the diffusion model. We further use this prediction to identify sources that deviate from this profile by examining the reconstruction error of the few central pixels regenerated in each source's core. Our approach, solely using VIS imaging, features high completeness compared to traditional methods of AGN and QSO selection, including optical, near-infrared, mid-infrared, and X-rays.

Paper Structure

This paper contains 46 sections, 14 equations, 26 figures, 5 tables.

Figures (26)

  • Figure 1: Diffusion pipeline (top) that progressively adds noise to images, training the model to predict what noise was added from the previous step. Once trained and during inference, the model takes pure Gaussian noise as input and can iteratively remove the noise until a realistic galaxy image remains. Repeat inference runs will provide a different and unique galaxy from those it was trained on. The Repaint pipeline (bottom) takes the trained diffusion model and enables conditioning to allow parts of an existing image to be preserved by masking. At each denoising step, noise levels in the preserved pixels are adjusted to ensure they integrate correctly with the newly generated sections. After $T$ iterations, the output includes the retained pixels and newly generated areas, creating a different yet plausible final image.
  • Figure 2: Linear schedule from the original diffusion implementation, which causes the parameters to converge early in the timesteps, resulting in training images becoming pure noise too soon and leading to suboptimal performance. The switch to the cosine-beta schedule adds noise at a much slower rate, prioritising smaller updates in the early stages, leading to more unique noised images throughout training.
  • Figure 3: Initial results for pixel value differences across various selections. Comparing the ratios of the centre's brightest pixel with the means of the surrounding 1- and 2-pixel-wide regions shows a clear distribution difference between galaxy and non-galaxy classes. The grey histogram shows the distribution of the whole dataset, showing how the images not captured in these selections compare. The median value for each magnitude bin is shown in the respective vertical line.
  • Figure 4: Distribution differences of the ROC value over each class. The narrow peaks from the non-galaxy class are significantly widened after applying the asinh transformation, causing a large overlap between classes.
  • Figure 5: Comparison between the original masked pixels and the pixels of the generated output. The model's outputs demonstrated a consistent prediction along the gradient of $y = x$ (dashed line), with a bias for underpredicting the true value. A secondary cluster, also following the same gradient, is shown to reduce the pixel values by a factor of 10. This implies an inherent difference in the input image causing the model to behave differently. This collection of sources could be prime AGN candidates using our model.
  • ...and 21 more figures