A self-regulated convolutional neural network for classifying variable stars
Francisco Pérez-Galarce, Jorge Martínez-Palomera, Karim Pichara, Pablo Huijse, Márcio Catelan
TL;DR
The paper tackles data-shift and class-imbalance challenges in variable-star classification by coupling a self-regulated CNN with a physics-informed PELS-VAE to generate synthetic light curves conditioned on Gaia DR3 parameters. The classifier uses dual masks to learn from real and synthetic data, with synthetic samples injected according to epoch-specific policies and driven by BGMM sampling of physical parameters. Empirical results on OGLE and Gaia DR3-derived biases show improved robustness across loss functions, policies, signal-to-noise ratios, and sequence lengths, and they provide a framework for more reliable hyperparameter search. This approach enhances generalisation to unseen, underrepresented regions of the physical-parameter space, with practical implications for time-domain surveys and online classification pipelines.
Abstract
Over the last two decades, machine learning models have been widely applied and have proven effective in classifying variable stars, particularly with the adoption of deep learning architectures such as convolutional neural networks, recurrent neural networks, and transformer models. While these models have achieved high accuracy, they require high-quality, representative data and a large number of labelled samples for each star type to generalise well, which can be challenging in time-domain surveys. This challenge often leads to models learning and reinforcing biases inherent in the training data, an issue that is not easily detectable when validation is performed on subsamples from the same catalogue. The problem of biases in variable star data has been largely overlooked, and a definitive solution has yet to be established. In this paper, we propose a new approach to improve the reliability of classifiers in variable star classification by introducing a self-regulated training process. This process utilises synthetic samples generated by a physics-enhanced latent space variational autoencoder, incorporating six physical parameters from Gaia Data Release 3. Our method features a dynamic interaction between a classifier and a generative model, where the generative model produces ad-hoc synthetic light curves to reduce confusion during classifier training and populate underrepresented regions in the physical parameter space. Experiments conducted under various scenarios demonstrate that our self-regulated training approach outperforms traditional training methods for classifying variable stars on biased datasets, showing statistically significant improvements.
