Table of Contents
Fetching ...

The NEMESIS Catalogue of Young Stellar Objects for the Orion Star Formation Complex. I. General description of data curation

J. Roquette, M. Audard, D. Hernandez, I. Gezer, G. Marton, C. Mas, M. Madarász, O. Dionatos

TL;DR

The paper presents the NEMESIS Catalogue of YSOs for the Orion Star Formation Complex, detailing a comprehensive data-curation framework that combines a historical literature compilation with modern photometric and spectroscopic surveys to build a large, panchromatic YSO catalogue. It ingests data from over 200 publications, harmonizes SEDs and infrared classifications, and derives uniform infrared classifications across ~93% of sources, while systematically assessing multiplicity and contamination. The work delivers a 27,879-source database with rich observables (SEDs, IR classes, lithium and gravity indicators, X-ray data, variability, and kinematics) and establishes methods to identify and quantify contaminants (giants, extragalactic sources, and MS stars). It further demonstrates the catalogue’s utility for validating and training youth-diagnosis methods and highlights the RUWE behavior in YSOs as a case study for Gaia astrometric diagnostics. The dataset is already being used by collaborators for morphology studies, deep-learning YSO identification, and SED fitting, and it provides a robust, scalable resource for future star-formation studies in Orion and beyond.

Abstract

The past decade has seen a rise in the use of Machine Learning methods in the study of young stellar evolution. This trend has led to a growing need for a comprehensive database of young stellar objects (YSO) that goes beyond survey-specific biases and that can be employed for training, validation, and refining the physical interpretation of machine learning outcomes. We reviewed the literature focused on the Orion Star Formation complex (OSFC) to compile a thorough catalogue of previously identified YSO candidates in the region including the curation of observables relevant to probe their youth. Starting from the NASA/ADS database, we assembled YSO candidates from more than 200 peer-reviewed publications. We collated data products relevant to the study of YSOs into a dedicated catalogue, which was complemented with data from large photometric and spectroscopic surveys and in the Strasbourg astronomical Data Center. We also added significant value to the catalogue by homogeneously deriving YSO infrared classification labels and through a comprehensive curation of labels concerning sources' multiplicity. Finally, we used a panchromatic approach to derive the probabilities that the sources in our catalogue were contaminant extragalactic sources or giant stars. We present the NEMESIS catalogue of YSOs for the OSFC, which includes data collated for 27879 sources covering the whole mass spectrum and the various stages of pre-Main Sequence evolution from protostars to diskless young stars. The catalogue includes a collection of panchromatic photometric data processed into spectral energy distributions, stellar parameters, infrared classes, equivalent widths of emission lines related to YSOs accretion and star-disk interaction, and absorption lines such as lithium and lines related to source's gravity, X-ray emission observables, photometric variability observables, and multiplicity labels.

The NEMESIS Catalogue of Young Stellar Objects for the Orion Star Formation Complex. I. General description of data curation

TL;DR

The paper presents the NEMESIS Catalogue of YSOs for the Orion Star Formation Complex, detailing a comprehensive data-curation framework that combines a historical literature compilation with modern photometric and spectroscopic surveys to build a large, panchromatic YSO catalogue. It ingests data from over 200 publications, harmonizes SEDs and infrared classifications, and derives uniform infrared classifications across ~93% of sources, while systematically assessing multiplicity and contamination. The work delivers a 27,879-source database with rich observables (SEDs, IR classes, lithium and gravity indicators, X-ray data, variability, and kinematics) and establishes methods to identify and quantify contaminants (giants, extragalactic sources, and MS stars). It further demonstrates the catalogue’s utility for validating and training youth-diagnosis methods and highlights the RUWE behavior in YSOs as a case study for Gaia astrometric diagnostics. The dataset is already being used by collaborators for morphology studies, deep-learning YSO identification, and SED fitting, and it provides a robust, scalable resource for future star-formation studies in Orion and beyond.

Abstract

The past decade has seen a rise in the use of Machine Learning methods in the study of young stellar evolution. This trend has led to a growing need for a comprehensive database of young stellar objects (YSO) that goes beyond survey-specific biases and that can be employed for training, validation, and refining the physical interpretation of machine learning outcomes. We reviewed the literature focused on the Orion Star Formation complex (OSFC) to compile a thorough catalogue of previously identified YSO candidates in the region including the curation of observables relevant to probe their youth. Starting from the NASA/ADS database, we assembled YSO candidates from more than 200 peer-reviewed publications. We collated data products relevant to the study of YSOs into a dedicated catalogue, which was complemented with data from large photometric and spectroscopic surveys and in the Strasbourg astronomical Data Center. We also added significant value to the catalogue by homogeneously deriving YSO infrared classification labels and through a comprehensive curation of labels concerning sources' multiplicity. Finally, we used a panchromatic approach to derive the probabilities that the sources in our catalogue were contaminant extragalactic sources or giant stars. We present the NEMESIS catalogue of YSOs for the OSFC, which includes data collated for 27879 sources covering the whole mass spectrum and the various stages of pre-Main Sequence evolution from protostars to diskless young stars. The catalogue includes a collection of panchromatic photometric data processed into spectral energy distributions, stellar parameters, infrared classes, equivalent widths of emission lines related to YSOs accretion and star-disk interaction, and absorption lines such as lithium and lines related to source's gravity, X-ray emission observables, photometric variability observables, and multiplicity labels.
Paper Structure (74 sections, 1 equation, 18 figures, 16 tables)

This paper contains 74 sections, 1 equation, 18 figures, 16 tables.

Figures (18)

  • Figure 1: Density distribution of sources included in our data compilation throughout the Orion Star-Formation Complex. The location of Monoceros R2 region - excluded from our compilation - is marked as a black 'X'.
  • Figure 2: Schematic workflow representation of the data curation process described in Sect. \ref{['sec:data']}
  • Figure 3: Examples of SEDs from our database for different types of YSOs Ḟrom top to bottom: a Class 0, a Class I, a Flat-Spectrum, a Class II, a Class III, and a Herbig AeBe star. See YSO classes discussion in Sect. \ref{['sec:ir_emission']}. Photometric data collated by us is plotted in coloured symbols, while data points retrieved by processing data from the VizieR-SED tool are plotted contoured in black. The grey area shows the wavelength range used for deriving $\alpha_{IR}$ indices in Sect. \ref{['sec:alphaindex']}.
  • Figure 4: Top: Number of photometric data points in the SEDs collated into the NEMESIS YSO catalogue for the OSFC. Bottom: Percentage of SEDs containing data points at a given wavelength range.
  • Figure 5: Distribution IR spectral indices, $\alpha_{IR}$, estimated for sources in the NEMESIS Catalogue of YSOs in the OSFC using all photometric data available in the wavelength range 2--24$\mu$m. Dotted lines show the limits between classes (Tab. \ref{['tab:yso_classes']}).
  • ...and 13 more figures