Table of Contents
Fetching ...

ASTROFLOW: A Real-Time End-to-End Pipeline for Radio Single-Pulse Searches

Guanhong Lin, Dejia Zhou, Jianli Zhang, Jialang Ding, Fei Liu, Xiaoyun Ma, Yuan Liang, Ruan Duan, Liaoyuan Liu, Xuanyu Wang, Xiaohui Yan, Yingrou Zhan, Yuting Chu, Jing Qiao, Wei Wang, Jie Zhang, Zerui Wang, Meng Liu, Chenchen Miao, Menquan Liu, Meng Guo, Di Li, Pei Wang

TL;DR

Astroflow addresses the challenge of real-time single-pulse searches in high-rate radio surveys by delivering an end-to-end, GPU-accelerated pipeline that unifies RFI mitigation, subband dedispersion, image-based candidate detection, and an object-detection model. The system combines a CUDA-accelerated backend with a YOLOv11N detector to process DM–time images and produce timely candidate outputs, validated on FAST-FREX and QUEST data with substantial speedups over CPU baselines. Key contributions include a two-stage subband dedispersion algorithm, robust RFI filtering, and an efficient 512×512 DM–time visualization pipeline that enables real-time detection with high recall and low false positives. The work demonstrates practical scalability for next-generation facilities, offering a deployable framework that can be refined with additional data and models for large-scale transient discovery.

Abstract

Fast radio bursts (FRBs) are extremely bright, millisecond duration cosmic transients of unknown origin. The growing number of wide-field and high-time-resolution radio surveys, particularly with next-generation facilities such as the SKA and MeerKAT, will dramatically increase FRB discovery rates, but also produce data volumes that overwhelm conventional search pipelines. Real-time detection thus demands software that is both algorithmically robust and computationally efficient. We present Astroflow, an end-to-end, GPU-accelerated pipeline for single-pulse detection in radio time-frequency data. Built on a unified C++/CUDA core with a Python interface, Astroflow integrates RFI excision, incoherent dedispersion, dynamic-spectrum tiling, and a YOLO-based deep detector. Through vectorized memory access, shared-memory tiling, and OpenMP parallelism, it achieves 10x faster-than-real-time processing on consumer GPUs for a typical 150 s, 2048-channel observation, while preserving high sensitivity across a wide range of pulse widths and dispersion measures. These results establish the feasibility of a fully integrated, GPU-accelerated single-pulse search stack, capable of scaling to the data volumes expected from upcoming large-scale surveys. Astroflow offers a reusable and deployable solution for real-time transient discovery, and provides a framework that can be continuously refined with new data and models.

ASTROFLOW: A Real-Time End-to-End Pipeline for Radio Single-Pulse Searches

TL;DR

Astroflow addresses the challenge of real-time single-pulse searches in high-rate radio surveys by delivering an end-to-end, GPU-accelerated pipeline that unifies RFI mitigation, subband dedispersion, image-based candidate detection, and an object-detection model. The system combines a CUDA-accelerated backend with a YOLOv11N detector to process DM–time images and produce timely candidate outputs, validated on FAST-FREX and QUEST data with substantial speedups over CPU baselines. Key contributions include a two-stage subband dedispersion algorithm, robust RFI filtering, and an efficient 512×512 DM–time visualization pipeline that enables real-time detection with high recall and low false positives. The work demonstrates practical scalability for next-generation facilities, offering a deployable framework that can be refined with additional data and models for large-scale transient discovery.

Abstract

Fast radio bursts (FRBs) are extremely bright, millisecond duration cosmic transients of unknown origin. The growing number of wide-field and high-time-resolution radio surveys, particularly with next-generation facilities such as the SKA and MeerKAT, will dramatically increase FRB discovery rates, but also produce data volumes that overwhelm conventional search pipelines. Real-time detection thus demands software that is both algorithmically robust and computationally efficient. We present Astroflow, an end-to-end, GPU-accelerated pipeline for single-pulse detection in radio time-frequency data. Built on a unified C++/CUDA core with a Python interface, Astroflow integrates RFI excision, incoherent dedispersion, dynamic-spectrum tiling, and a YOLO-based deep detector. Through vectorized memory access, shared-memory tiling, and OpenMP parallelism, it achieves 10x faster-than-real-time processing on consumer GPUs for a typical 150 s, 2048-channel observation, while preserving high sensitivity across a wide range of pulse widths and dispersion measures. These results establish the feasibility of a fully integrated, GPU-accelerated single-pulse search stack, capable of scaling to the data volumes expected from upcoming large-scale surveys. Astroflow offers a reusable and deployable solution for real-time transient discovery, and provides a framework that can be continuously refined with new data and models.

Paper Structure

This paper contains 27 sections, 15 equations, 14 figures, 1 table.

Figures (14)

  • Figure 1: Block diagram of the Astroflow pipeline. Python-side ingestion (Filterbank, PSRFITS) bridges via pybind11 to a C++/CUDA backend for RFI excision, subbanding, optional downsampling, and incoherent dedispersion; products feed a YOLOv11n detector and a plotter/producer. Arrows indicate data flow; gray denotes Python modules, blue denotes C++/CUDA kernels, and orange denotes the neural-network component.
  • Figure 2: Comparison of dynamic spectra before and after RFI mitigation.
  • Figure 3: Examples of ${\rm DM}$–time tiles produced by the gridding and rendering. All eight panels are $M\times M$ pseudo–colour images generated from the dedispersed data and rendered with a perceptually uniform colourmap. The first three columns show slices dominated by background and radio–frequency interference (RFI), whereas the two panels in the rightmost column exhibit the characteristic “bow–tie” single–pulse morphology—i.e., the specific targets of the downstream detection model.
  • Figure 4: Example candidate and panel layout. The header lists the file identifier and post–processing measurements at the best–fit dispersion measure and time of arrival ($\mathrm{DM}=418.994~\mathrm{pc\,cm^{-3}}$, $\mathrm{TOA}=4.738~\mathrm{s}$), along with the integrated signal–to–noise ratio ($\mathrm{S/N}=4608.66$) and pulse width ($6.11~\mathrm{ms}$). (a) Time series obtained by summing the $\mathrm{DM}$–time map in panel (b) over $\mathrm{DM}$ within the detection window. (b) $\mathrm{DM}$–time diagnostic after re–dedispersion; the dashed ellipse marks the detection window centered on the best $\mathrm{DM}/\mathrm{TOA}$. (c) bandpass (integrated power as a function of radio frequency), annotated with the channelization parameters. (d) Dedispersed time series at the best $\mathrm{DM}$; the measured $\mathrm{TOA}$ is shown by the vertical dashed line and the $\mathrm{S/N}$/width are indicated in the legend. (e) Dedispersed frequency–time dynamic spectrum centered on the candidate; the burst appears as a near–vertical enhancement across the band. (f) Spectrum integrated over time (“frequency–integrated power”).
  • Figure 5: Dedispersion runtime versus frequency–channel count. Left axis: kernel execution time; right axis: real–time factor $R=t_{\rm obs}/t_c$. The benchmark uses a synthetic 8-bit dataset with duration $T=22\,\mathrm{s}$, sampling interval $t_s=40\,\mu\mathrm{s}$, center frequency $f_c=1250\,\mathrm{MHz}$, total bandwidth $500\,\mathrm{MHz}$, DM range $0$–$1024$, and $N_{\rm DM}=1500$ trials, executed on an RTX 4090. The green curve shows Heimdall (dedisp); the black curve shows Astroflow. The purple dash–dot line (triangles) and the gray dotted line (diamonds) give $R$ for Astroflow and Heimdall, respectively. As the number of channels increases from $512$ to $8192$, the Heimdall runtime grows from $\sim\!1.03$ s to $\sim\!8.72$ s ($R\!\approx\!22\rightarrow2$), while Astroflow remains markedly lower, from $\sim\!0.11$ s to $\sim\!0.96$ s ($R\!\approx\!190\rightarrow25$), maintaining faster–than–real–time performance across all tested channelizations.
  • ...and 9 more figures