Table of Contents
Fetching ...

Identifying lopsidedness in spiral galaxies using a Deep Convolutional Neural Network

Biju Saha, Suman Sarkar, Arunima Banerjee

TL;DR

The paper tackles the prevalence and origin of large-scale lopsidedness in spiral galaxies by developing an automated classifier using transfer learning on Zoobot with a ConvNeXT_nano backbone, fine-tuned on a manually labeled set of SDSS DR18 spirals. It constructs a training sample of 934 galaxies from a much larger 7,042 almost face-on spirals, achieving strong performance with a mean ROC-AUC of $0.94\pm0.02$ and best-case $0.96$; applying the model to 6,108 additional galaxies yields a high-confidence subset of 2,658 lopsided and 1,455 symmetric objects. The authors find that lopsided galaxies tend to be higher in star formation, bluer, less concentrated, and lower in stellar mass than symmetric counterparts, providing a usable catalog to probe formation scenarios. They also examine redshift and spiral-arm morphology effects on classification, discuss biases from resolution and environment, and provide public access to the dataset and model to enable future studies of disk asymmetry.

Abstract

About 30\% of disk galaxies show lopsidedness in their stellar disk. Although such a large-scale asymmetry in the disk can be primarily looked upon as a long-lived mode ($m=1$), the physical origin of the lopsidedness in the disk continues to be a puzzle. In this work, we employ a transfer-learning approach for the automated identification of lopsided galaxies using SDSS DR18 imaging by fine-tuning a Zoobot model, a deep convolutional neural network package pre-trained on the Galaxy Zoo dataset. We obtain 7,042 well-resolved, nearly face-on spiral galaxies from SDSS DR18 over the redshift range 0.01 $\leq z \leq 0.1$, with extinction-corrected g-band model magnitude < 16 and Petrosian radius (enclosing 90 \% of the flux) $\geq$ 3 arcsec. Out of these, we visually identify 490 lopsided and 444 symmetric galaxy samples suitable for training. The trained model achieves a testing accuracy of $(87 \pm 0.02)$ \%, averaged over 10 independent trials. Using the best-performing model, we identify 3,679 lopsided and 2,429 symmetric galaxies from the remaining sample. Of these, 2,658 lopsided and 1,455 symmetric galaxies are predicted with are predicted with high prediction probability $P_{pred} \geq 0.85$. Lopsided galaxies in our predicted samples are relatively high star-forming, bluer, low-concentration (late-type), low-mass galaxies compared to the symmetric galaxies. Our study produces an usable catalogue of lopsided and symmetric galaxies, which will offer new insights into the formation of lopsidedness in disk galaxies. The dataset and the best-performing model are made publicly available through GitHub at https://github.com/bijusaha-astro/CNN_lopsided

Identifying lopsidedness in spiral galaxies using a Deep Convolutional Neural Network

TL;DR

The paper tackles the prevalence and origin of large-scale lopsidedness in spiral galaxies by developing an automated classifier using transfer learning on Zoobot with a ConvNeXT_nano backbone, fine-tuned on a manually labeled set of SDSS DR18 spirals. It constructs a training sample of 934 galaxies from a much larger 7,042 almost face-on spirals, achieving strong performance with a mean ROC-AUC of and best-case ; applying the model to 6,108 additional galaxies yields a high-confidence subset of 2,658 lopsided and 1,455 symmetric objects. The authors find that lopsided galaxies tend to be higher in star formation, bluer, less concentrated, and lower in stellar mass than symmetric counterparts, providing a usable catalog to probe formation scenarios. They also examine redshift and spiral-arm morphology effects on classification, discuss biases from resolution and environment, and provide public access to the dataset and model to enable future studies of disk asymmetry.

Abstract

About 30\% of disk galaxies show lopsidedness in their stellar disk. Although such a large-scale asymmetry in the disk can be primarily looked upon as a long-lived mode (), the physical origin of the lopsidedness in the disk continues to be a puzzle. In this work, we employ a transfer-learning approach for the automated identification of lopsided galaxies using SDSS DR18 imaging by fine-tuning a Zoobot model, a deep convolutional neural network package pre-trained on the Galaxy Zoo dataset. We obtain 7,042 well-resolved, nearly face-on spiral galaxies from SDSS DR18 over the redshift range 0.01 , with extinction-corrected g-band model magnitude < 16 and Petrosian radius (enclosing 90 \% of the flux) 3 arcsec. Out of these, we visually identify 490 lopsided and 444 symmetric galaxy samples suitable for training. The trained model achieves a testing accuracy of \%, averaged over 10 independent trials. Using the best-performing model, we identify 3,679 lopsided and 2,429 symmetric galaxies from the remaining sample. Of these, 2,658 lopsided and 1,455 symmetric galaxies are predicted with are predicted with high prediction probability . Lopsided galaxies in our predicted samples are relatively high star-forming, bluer, low-concentration (late-type), low-mass galaxies compared to the symmetric galaxies. Our study produces an usable catalogue of lopsided and symmetric galaxies, which will offer new insights into the formation of lopsidedness in disk galaxies. The dataset and the best-performing model are made publicly available through GitHub at https://github.com/bijusaha-astro/CNN_lopsided

Paper Structure

This paper contains 11 sections, 1 equation, 15 figures, 4 tables.

Figures (15)

  • Figure 1: Clockwise from top left: The distributions of redshift, Petrosian radius (enclosing 90 % of the flux) in the g-band, extinction-corrected g-band model magnitude modelMag_g, and $A_1$ for the samples of lopsided and symmetric spiral galaxies in the training set. A few galaxies were excluded from the Petrosian radius and $A_1$ plots to avoid excessive scaling.
  • Figure 2: A subset of the lopsided (left) and symmetric (right) galaxies from the SDSS DR18 that are used for the training.
  • Figure 3: The loss and accuracy of the model as a function of epochs. The solid line represents the mean over 10 independent trials, while the shaded band indicates the $\pm1\sigma$ interval.
  • Figure 4: The Receiver Operating Characteristic (ROC) curve. The solid line shows the mean ROC curve for the 10 independent trials with the shaded region showing the $\pm1\sigma$ interval. The black dashed line indicates a random-guessing classifier with an AUC of 0.5.
  • Figure 5: The confusion matrix representing the correctly predicted and falsely predicted sample in the test set, evaluated using the best-performing model (which has the greatest testing AUC score).
  • ...and 10 more figures