Table of Contents
Fetching ...

An automated activity classification tool for optical galaxy spectra

C. Daoutis, A. Zezas, E. Kyritsis, K. Kouroumpatzakis, P. Bonfini

TL;DR

An automated, diagnostic tool capable of distinguishing between star-forming, active galactic nuclei (AGN), low-ionization nuclear emission-line regions (LINERs), composite, and passive galaxies is developed, based on a support vector machine trained on data from optical emission-line ratios and color selection criteria.

Abstract

Reliable, versatile galaxy activity diagnostics are essential for understanding galaxy evolution. Traditional methods frequently necessitate extensive preprocessing, such as starlight subtraction and emission line deblending (e.g., Hα and [N II]), which can introduce substantial biases and uncertainties due to their model-dependent nature. In this work we developed an automated, diagnostic tool capable of distinguishing between star-forming (SF), active galactic nuclei (AGN), low-ionization nuclear emission-line regions (LINERs), composite, and passive galaxies. We developed a diagnostic tool based on a support vector machine trained on data from optical emission-line ratios and color selection criteria. From literature studies and exploring combinations of discriminatory feature schemes, we found that the equivalent widths of Hβ, [O III]λ5007, and Hα+[N II]λ6548,84 as key diagnostic features. Additionally, galaxies classified as AGN can be distinguished into broad- and narrow-line AGN by measuring the full quarter at the half-maximum of Hα and [N II] complex. We have developed a diagnostic tool that encompasses all activities of galaxies while achieving high performance scores across all of them. Our diagnostic achieves overall accuracy of 83% and recall of 79% for SF, 94% for AGN, 85% for LINER, 77% for composite, and 96% for passive galaxies. Our diagnostic tool significantly improves upon existing diagnostics as it eliminates the need for preprocessing (i.e., starlight subtraction or flux calibration) and spectral line fitting, includes all activity classes under one scheme, and distinguishes the two main AGN types. In addition, omitting starlight subtraction does not significantly reduce performance. Furthermore, Its narrow wavelength requirement enables use across a wide redshift range, making it ideal for high-z studies.

An automated activity classification tool for optical galaxy spectra

TL;DR

An automated, diagnostic tool capable of distinguishing between star-forming, active galactic nuclei (AGN), low-ionization nuclear emission-line regions (LINERs), composite, and passive galaxies is developed, based on a support vector machine trained on data from optical emission-line ratios and color selection criteria.

Abstract

Reliable, versatile galaxy activity diagnostics are essential for understanding galaxy evolution. Traditional methods frequently necessitate extensive preprocessing, such as starlight subtraction and emission line deblending (e.g., Hα and [N II]), which can introduce substantial biases and uncertainties due to their model-dependent nature. In this work we developed an automated, diagnostic tool capable of distinguishing between star-forming (SF), active galactic nuclei (AGN), low-ionization nuclear emission-line regions (LINERs), composite, and passive galaxies. We developed a diagnostic tool based on a support vector machine trained on data from optical emission-line ratios and color selection criteria. From literature studies and exploring combinations of discriminatory feature schemes, we found that the equivalent widths of Hβ, [O III]λ5007, and Hα+[N II]λ6548,84 as key diagnostic features. Additionally, galaxies classified as AGN can be distinguished into broad- and narrow-line AGN by measuring the full quarter at the half-maximum of Hα and [N II] complex. We have developed a diagnostic tool that encompasses all activities of galaxies while achieving high performance scores across all of them. Our diagnostic achieves overall accuracy of 83% and recall of 79% for SF, 94% for AGN, 85% for LINER, 77% for composite, and 96% for passive galaxies. Our diagnostic tool significantly improves upon existing diagnostics as it eliminates the need for preprocessing (i.e., starlight subtraction or flux calibration) and spectral line fitting, includes all activity classes under one scheme, and distinguishes the two main AGN types. In addition, omitting starlight subtraction does not significantly reduce performance. Furthermore, Its narrow wavelength requirement enables use across a wide redshift range, making it ideal for high-z studies.
Paper Structure (28 sections, 2 equations, 10 figures, 7 tables)

This paper contains 28 sections, 2 equations, 10 figures, 7 tables.

Figures (10)

  • Figure 1: Distributions of the EW of the three spectral lines [O iii] (top), of [N ii] doublet and H$\alpha$ (middle), and H$\beta$ (bottom) used as discriminating features for the development of our new diagnostic for the activity classes of star-forming (SF), AGN, LINER, composite, and passive galaxies. All measurements were performed on non-starlight-subtracted SDSS spectra. We adhere to the same conventions as the SDSS, wherein negative EW correspond to emission.
  • Figure 2: Calibration curves for the predicted probabilities of each activity class. This plot illustrates the relationship between the predicted probabilities (derived from our diagnostic) and the actual frequency of an activity class appearing among the remaining classes in the feature space. The dashed line represents an idealized classifier with perfect calibration. We observe that for star-forming (SF) and AGN galaxies, the predicted probabilities closely align with the observed frequencies. Passive galaxies exhibit a greater deviation from the dashed line compared to the previous two classes. Notably, LINER galaxies and passive galaxies demonstrate more pronounced deviations, which is consistent with their intricate nature.
  • Figure 3: Confusion matrix summarizing the performance of our new diagnostic tool on the test set (Sect. \ref{['implemantation']}) of our final sample (Sect. \ref{['final_sample']}). We see that almost all objects are found on the primary diagonal indicative of a highly-performing classifier. There are a few objects in the off-diagonal elements (missclassifications), indicating mild mixing between the composite and star-forming (SF) and composite and LINER classes. This is expected, as these classes share common characteristics.
  • Figure 4: Two examples of the classification output generated by our diagnostic, demonstrating a confident classification (top row) and a less confident classification (bottom row). Both objects have the same standard deviation (measurement error) in their EWs across all features. The output left (blue) histograms show: the resulting classifications based on Monte Carlo sampling the EW of the spectral features within their uncertainties, while the right (green) histograms show the corresponding probability for the different classes. The error bars represent the standard deviation of the predicted probabilities for each class. The classification result for the object depicted in the top row indicates a reliable classification, whereas the object in the bottom row exhibits ambiguous results.
  • Figure 5: Distribution of activity classes derived from the implementation of our diagnostic on the HECATE catalog galaxies using SDSS DR17 spectra. Any galaxy classified as an AGN by our diagnostic is subsequently characterized as a broad-line (BL) or narrow-line (NL) AGN.
  • ...and 5 more figures