Wide Area VISTA Extragalactic Survey (WAVES): Selection of targets for the Wide survey using decision-tree classification

G. Kaur; M. Bilicki; S. Bellstedt; E. Tempel; W. A. Hellwing; I. Baldry; B. Bandi; S. Barsanti; S. Driver; N. Guerra-Varas; B. Holwerda; C. Lagos; J. Loveday; A. Robotham

Wide Area VISTA Extragalactic Survey (WAVES): Selection of targets for the Wide survey using decision-tree classification

G. Kaur, M. Bilicki, S. Bellstedt, E. Tempel, W. A. Hellwing, I. Baldry, B. Bandi, S. Barsanti, S. Driver, N. Guerra-Varas, B. Holwerda, C. Lagos, J. Loveday, A. Robotham

TL;DR

This work presents a supervised learning approach to WAVES-Wide target selection that bypasses explicit photometric redshift estimation by predicting the probability that a galaxy lies below $z=0.2$ using a binary classifier trained on spectroscopic labels. Utilizing 9-band KiDS+VISTA photometry and 36 color combinations (45 features in total), an XGBoost classifier is trained on a spec-z catalog and evaluated with 5-fold cross-validation, achieving about 95% purity and completeness for the fiducial cuts $Z<21.1$ and $z<0.2$, with the probability threshold $P>0.5$ providing a balanced trade-off. The classifier's performance degrades near selection limits and depends on S/N and color, but SHAP and permutation-importance analyses identify $g-r$, $u-g$, $g$, and $J-K_s$ as highly informative features. When applied to the full photo-catalog, the method yields ~2.6 million targets at $P>0.5$, illustrating the need to tune probability thresholds to control sample size and contamination, and highlighting extrapolation risks due to incomplete spectroscopic coverage. The approach is designed to complement other WAVES target-selection methods in a joint framework, offering a scalable path to constructing a flux- and redshift-limited spectroscopic sample for the 4MOST instrument, with room for improvement as more labeled data become available.

Abstract

The Wide-Area VISTA Extragalactic Survey (WAVES) on the 4-metre Multi-Object Spectroscopic Telescope (4MOST) includes two flux-limited subsurveys with very high (95\%) completeness requirements: Wide over $\sim\!1200$ deg$^2$ and Deep over $\sim\!65$ deg$^2$. Both are $Z$-band selected, respectively as $Z<21.1$ and $Z<21.25$ mag, and additionally redshift-limited, while the true redshifts are not known a priori but will be only measured by 4MOST. Here, we present a classification-based method to select the targets for WAVES-Wide. Rather than estimating individual redshifts for the input photometric objects, we assign probabilities of them being below $z=0.2$, the redshift limit of the subsurvey. This is done with the supervised machine learning approach of eXtreme Gradient Boosting (XGB), trained on a comprehensive spectroscopic sample overlapping with WAVES fields. Our feature space is composed of nine VST+VISTA magnitudes from $u$ to $K_s$ and all the possible colors, but most relevant for the classification are the $g$-band and the $u-g$, $g-r$ and $J-K_s$ colors. We check the performance of our classifier both for the fiducial WAVES-Wide limits, as well as for a range of neighboring redshift and magnitude thresholds, consistently finding purity and completeness at the level of 94-95\%. We note, however, that this performance deteriorates for sources close to the selection limits, due to deficiencies of the current spectroscopic training sample and the decreasing signal-to-noise of the photometry. We apply the classifier trained on the full spectroscopic sample to 14 million photometric galaxies from the WAVES input catalog, which have all 9 bands measured. Our work demonstrates that a machine-learning classifier could be used to select a flux- and redshift-limited sample from deep photometric data.

Wide Area VISTA Extragalactic Survey (WAVES): Selection of targets for the Wide survey using decision-tree classification

TL;DR

This work presents a supervised learning approach to WAVES-Wide target selection that bypasses explicit photometric redshift estimation by predicting the probability that a galaxy lies below

using a binary classifier trained on spectroscopic labels. Utilizing 9-band KiDS+VISTA photometry and 36 color combinations (45 features in total), an XGBoost classifier is trained on a spec-z catalog and evaluated with 5-fold cross-validation, achieving about 95% purity and completeness for the fiducial cuts

and

, with the probability threshold

providing a balanced trade-off. The classifier's performance degrades near selection limits and depends on S/N and color, but SHAP and permutation-importance analyses identify

, and

as highly informative features. When applied to the full photo-catalog, the method yields ~2.6 million targets at

, illustrating the need to tune probability thresholds to control sample size and contamination, and highlighting extrapolation risks due to incomplete spectroscopic coverage. The approach is designed to complement other WAVES target-selection methods in a joint framework, offering a scalable path to constructing a flux- and redshift-limited spectroscopic sample for the 4MOST instrument, with room for improvement as more labeled data become available.

Abstract

deg

and Deep over

deg

. Both are

-band selected, respectively as

and

mag, and additionally redshift-limited, while the true redshifts are not known a priori but will be only measured by 4MOST. Here, we present a classification-based method to select the targets for WAVES-Wide. Rather than estimating individual redshifts for the input photometric objects, we assign probabilities of them being below

, the redshift limit of the subsurvey. This is done with the supervised machine learning approach of eXtreme Gradient Boosting (XGB), trained on a comprehensive spectroscopic sample overlapping with WAVES fields. Our feature space is composed of nine VST+VISTA magnitudes from

and all the possible colors, but most relevant for the classification are the

-band and the

and

colors. We check the performance of our classifier both for the fiducial WAVES-Wide limits, as well as for a range of neighboring redshift and magnitude thresholds, consistently finding purity and completeness at the level of 94-95\%. We note, however, that this performance deteriorates for sources close to the selection limits, due to deficiencies of the current spectroscopic training sample and the decreasing signal-to-noise of the photometry. We apply the classifier trained on the full spectroscopic sample to 14 million photometric galaxies from the WAVES input catalog, which have all 9 bands measured. Our work demonstrates that a machine-learning classifier could be used to select a flux- and redshift-limited sample from deep photometric data.

Wide Area VISTA Extragalactic Survey (WAVES): Selection of targets for the Wide survey using decision-tree classification

TL;DR

Abstract

Wide Area VISTA Extragalactic Survey (WAVES): Selection of targets for the Wide survey using decision-tree classification

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (17)