Table of Contents
Fetching ...

The DAWES review 10: The impact of deep learning for the analysis of galaxy surveys

Marc Huertas-Company, François Lanusse

TL;DR

The DAWES review 10 assesses how deep learning has reshaped galaxy surveys by mapping four broad application families: computer vision tasks, inference of physical galaxy properties, discovery, and cosmology. It finds that CNNs dominate morphology and lensing classifications and that DL models act as fast emulators for photo-z, structure, and SFH inference, often trained on simulations. Key contributions include demonstrations of substantial speedups, identification of persistent issues (uncertainty quantification, domain shift, and interpretability), and a call for standardized benchmarks. The work underscores both the transformative potential of DL for real-time analysis of next-generation surveys and the practical barriers to deployment in scientifically rigorous pipelines. Overall, DL is increasingly integral but requires careful handling of biases, uncertainties, and domain gaps to realize its full impact in cosmology and galaxy formation.

Abstract

The amount and complexity of data delivered by modern galaxy surveys has been steadily increasing over the past years. Extracting coherent scientific information from these large and multi-modal data sets remains an open issue and data driven approaches such as deep learning have rapidly emerged as a potentially powerful solution to some long lasting challenges. This enthusiasm is reflected in an unprecedented exponential growth of publications using neural networks. Half a decade after the first published work in astronomy mentioning deep learning, we believe it is timely to review what has been the real impact of this new technology in the field and its potential to solve key challenges raised by the size and complexity of the new datasets. In this review we first aim at summarizing the main applications of deep learning for galaxy surveys that have emerged so far. We then extract the major achievements and lessons learned and highlight key open questions and limitations. Overall, state-of-the art deep learning methods are rapidly adopted by the astronomical community, reflecting a democratization of these methods. We show that the majority of works using deep learning up to date are oriented to computer vision tasks. This is also the domain of application where deep learning has brought the most important breakthroughs so far. We report that the applications are becoming more diverse and deep learning is used for estimating galaxy properties, identifying outliers or constraining the cosmological model. Most of these works remain at the exploratory level. Some common challenges will most likely need to be addressed before moving to the next phase of deployment of deep learning in the processing of future surveys; e.g. uncertainty quantification, interpretability, data labeling and domain shift issues from training with simulations, which constitutes a common practice in astronomy.

The DAWES review 10: The impact of deep learning for the analysis of galaxy surveys

TL;DR

The DAWES review 10 assesses how deep learning has reshaped galaxy surveys by mapping four broad application families: computer vision tasks, inference of physical galaxy properties, discovery, and cosmology. It finds that CNNs dominate morphology and lensing classifications and that DL models act as fast emulators for photo-z, structure, and SFH inference, often trained on simulations. Key contributions include demonstrations of substantial speedups, identification of persistent issues (uncertainty quantification, domain shift, and interpretability), and a call for standardized benchmarks. The work underscores both the transformative potential of DL for real-time analysis of next-generation surveys and the practical barriers to deployment in scientifically rigorous pipelines. Overall, DL is increasingly integral but requires careful handling of biases, uncertainties, and domain gaps to realize its full impact in cosmology and galaxy formation.

Abstract

The amount and complexity of data delivered by modern galaxy surveys has been steadily increasing over the past years. Extracting coherent scientific information from these large and multi-modal data sets remains an open issue and data driven approaches such as deep learning have rapidly emerged as a potentially powerful solution to some long lasting challenges. This enthusiasm is reflected in an unprecedented exponential growth of publications using neural networks. Half a decade after the first published work in astronomy mentioning deep learning, we believe it is timely to review what has been the real impact of this new technology in the field and its potential to solve key challenges raised by the size and complexity of the new datasets. In this review we first aim at summarizing the main applications of deep learning for galaxy surveys that have emerged so far. We then extract the major achievements and lessons learned and highlight key open questions and limitations. Overall, state-of-the art deep learning methods are rapidly adopted by the astronomical community, reflecting a democratization of these methods. We show that the majority of works using deep learning up to date are oriented to computer vision tasks. This is also the domain of application where deep learning has brought the most important breakthroughs so far. We report that the applications are becoming more diverse and deep learning is used for estimating galaxy properties, identifying outliers or constraining the cosmological model. Most of these works remain at the exploratory level. Some common challenges will most likely need to be addressed before moving to the next phase of deployment of deep learning in the processing of future surveys; e.g. uncertainty quantification, interpretability, data labeling and domain shift issues from training with simulations, which constitutes a common practice in astronomy.
Paper Structure (69 sections, 48 figures, 2 tables)

This paper contains 69 sections, 48 figures, 2 tables.

Figures (48)

  • Figure 1: Relative change of the number of papers on arXiv:astro-ph with different keywords in the abstract as a function of time. The number of works mentioning neural networks in the abstract has experienced an unprecedented growth in the last $\sim6$ years, significantly steeper than other topic in astrophysics. Source: ArXivSorter
  • Figure 2: Example of level of agreement (red circles) and model confidence (blue squares) versus classification accuracy. Each panel shows a different question in the Galaxy Zoo classification tree (smoothness, edge-on, bar). The authors quote an unprecedented accuracy of $>90\%$. This is the first work that uses CNNs in astrophysics. The figure is adapted from dieleman2015
  • Figure 3: Schematic view of a simple Vanilla type Convolutional Neural Network, the most common approach for binary, multi class classification and regression in extragalactic imaging. The input, which is typically an image is fed to a series of convolutional layers. The resulting embedding is used an input of a Multi Layer Perceptron which outputs a float or array of floats. If the problem is a classification, the standard loss function is the crossentropy ($H$), while if it is a regression the quadratic loss ($L_2$) is usually employed.
  • Figure 4: Example of two different simulated samples of strong lenses used for training a CNN. These simulations were used to detect strong lenses in the CFHTLS survey. Figure adapted from jacobs2017
  • Figure 5: Comparison of Capsule Networks and CNNs applied to classify morphologies of radio galaxies. The ROC curves show the different performances. Capsule Networks do not offer a significant gain in this context. Figure adapted from Lukic2019
  • ...and 43 more figures