Table of Contents
Fetching ...

Machine Phenomenology: A Simple Equation Classifying Fast Radio Bursts

Yang Liu, Yuhao Lu, Rahim Moradi, Bo Yang, Bing Zhang, Wenbin Lin, Yu Wang

TL;DR

This work tackles identifying whether fast radio bursts originate from two distinct physical classes by deriving simple, interpretable equations from CHIME FRB observables. It couples human-guided feature selection and Buckingham π-based dimensionless grouping with symbolic regression, yielding two classification approaches: a power-law multiplier and Neural Dimensionless Regression (NDR). The Power-Law Model achieves high accuracy on Catalog 1 but shows limited generalizability, while the NDR approach produces a stable, dimensionally consistent equation that partitions FRBs into two Gaussian populations and generalizes well to Catalog 2. The results suggest two underlying FRB processes with distinct spectral-temporal traits and demonstrate a principled framework for physics-informed, interpretable discovery in astrophysical data.

Abstract

This work shows how human physical reasoning can guide machine-driven symbolic regression toward discovering empirical laws from observations. As an example, we derive a simple equation that classifies fast radio bursts (FRBs) into two distinct Gaussian distributions, indicating the existence of two physical classes. This human-AI workflow integrates feature selection, dimensional analysis, and symbolic regression: deep learning first analyzes CHIME Catalog 1 and identifies six independent parameters that collectively provide a complete description of FRBs; guided by Buckingham-$π$ analysis and correlation analysis, humans then construct dimensionless groups; finally, symbolic regression performed by the machine discovers the governing equation. When applied to the newer CHIME Catalog, the equation produces consistent results, demonstrating that it captures the underlying physics. This framework is applicable to a broad range of scientific domains.

Machine Phenomenology: A Simple Equation Classifying Fast Radio Bursts

TL;DR

This work tackles identifying whether fast radio bursts originate from two distinct physical classes by deriving simple, interpretable equations from CHIME FRB observables. It couples human-guided feature selection and Buckingham π-based dimensionless grouping with symbolic regression, yielding two classification approaches: a power-law multiplier and Neural Dimensionless Regression (NDR). The Power-Law Model achieves high accuracy on Catalog 1 but shows limited generalizability, while the NDR approach produces a stable, dimensionally consistent equation that partitions FRBs into two Gaussian populations and generalizes well to Catalog 2. The results suggest two underlying FRB processes with distinct spectral-temporal traits and demonstrate a principled framework for physics-informed, interpretable discovery in astrophysical data.

Abstract

This work shows how human physical reasoning can guide machine-driven symbolic regression toward discovering empirical laws from observations. As an example, we derive a simple equation that classifies fast radio bursts (FRBs) into two distinct Gaussian distributions, indicating the existence of two physical classes. This human-AI workflow integrates feature selection, dimensional analysis, and symbolic regression: deep learning first analyzes CHIME Catalog 1 and identifies six independent parameters that collectively provide a complete description of FRBs; guided by Buckingham- analysis and correlation analysis, humans then construct dimensionless groups; finally, symbolic regression performed by the machine discovers the governing equation. When applied to the newer CHIME Catalog, the equation produces consistent results, demonstrating that it captures the underlying physics. This framework is applicable to a broad range of scientific domains.

Paper Structure

This paper contains 21 sections, 23 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Visualization of the feature selection. The plot shows the frequency with which each feature was ranked among the top six most important features across 100 neural network models. The green rectangles highlight the six selected features. Parameters include: $\alpha$: spectral index; $\Delta t$: sub-burst width; $DM$: excess DM using YMW16 model; $\Delta t$ (Boxcar): entire burst width from boxcar method; $f$: flux density; $\Delta \nu$: frequency bandwidth; $\nu_p$: peak frequency; $DM$ (NE2001): excess DM using NE2001 model; $\chi^2$: chi-square statistic; $\nu_{low}$: lower frequency bound; $\nu_{high}$: upper frequency bound.
  • Figure 2: Input features distributions for repeating and non-repeating FRBs. The peaks at the edges of the $\nu_p$ and $\Delta_\nu$ distributions originate from the limited bandwidth of CHIME, while the peak in the $DM$ distribution arises from multiple bursts produced by repeaters.
  • Figure 3: Parameter relationships between repeating and non-repeating FRBs, with numerical values annotated in each cell. The top row shows power-law indices and the bottom row shows correlation coefficients between six parameters. Left panels show results for all repeating FRBs, middle panels show a random sample of non-repeaters matched to the repeater sample size, and right panels show results for all non-repeating FRBs.
  • Figure 4: Relationship between peak frequency ($\nu_p$) and frequency width ($\Delta_\nu$) for repeating (brown dots) and non-repeating (green dots) FRBs. A power-law fit for repeaters (brown line) shows the scaling relationship $\Delta_\nu \propto \nu_p^2$, with the shaded region representing the 1-$s\sigma$ uncertainty region. A non-parametric Gaussian Process fit (grey line) with its 1-$\sigma$ uncertainty region is also shown, capturing potential nonlinear trends in the relationship. The data points hit the boundaries, marked by dots with white centers, are due to the limited bandwidth of the CHIME telescope.
  • Figure 5: Left: Interaction matrix of $\Delta g_{x_i}$ (diagonal elements) and $\Delta g_{x_i, x_j}$ (off-diagonal elements), indicating the importance of the parameters. Right: Interaction matrix of $\eta_{x_i, x_j}$, indicating the nonlinear relationships.
  • ...and 5 more figures