A Deep Neural Network Based Reverse Radio Spectrogram Search Algorithm
Peter Xiangyuan Ma, Steve Croft, Chris Lintott, Andrew P. V. Siemion
TL;DR
The paper addresses the challenge of vetting signals in radio spectrograms under heavy RFI by introducing a modular reverse search framework that uses a $\beta$-VAE encoder to extract robust morphology features from ~$715$ Hz windows and augments them with a frequency embedding inspired by transformer positional encodings. The method ranks lookalike signals via cosine similarity between the encoded SOI and candidate encodings, enabling fast, scalable retrieval suitable for technosignature vetting. Quantitative results show superior clustering, disentanglement, and retrieval quality for the β-VAE with frequency embedding compared to traditional feature extractors and baselines, with substantial gains in interpretability and efficiency on Breakthrough Listen data. The approach promises broad applicability to large astronomical datasets and could underpin automated RFI databases and template searches across spectrogram-type data.
Abstract
Modern radio astronomy instruments generate vast amounts of data, and the increasingly challenging radio frequency interference (RFI) environment necessitates ever-more sophisticated RFI rejection algorithms. The "needle in a haystack" nature of searches for transients and technosignatures requires us to develop methods that can determine whether a signal of interest has unique properties, or is a part of some larger set of pernicious RFI. In the past, this vetting has required onerous manual inspection of very large numbers of signals. In this paper we present a fast and modular deep learning algorithm to search for lookalike signals of interest in radio spectrogram data. First, we trained a B-Variational Autoencoder on signals returned by an energy detection algorithm. We then adapted a positional embedding layer from classical Transformer architecture to a embed additional metadata, which we demonstrate using a frequency-based embedding. Next we used the encoder component of the B-Variational Autoencoder to extract features from small (~ 715,Hz, with a resolution of 2.79Hz per frequency bin) windows in the radio spectrogram. We used our algorithm to conduct a search for a given query (encoded signal of interest) on a set of signals (encoded features of searched items) to produce the top candidates with similar features. We successfully demonstrate that the algorithm retrieves signals with similar appearance, given only the original radio spectrogram data. This algorithm can be used to improve the efficiency of vetting signals of interest in technosignature searches, but could also be applied to a wider variety of searches for "lookalike" signals in large astronomical datasets.
