Table of Contents
Fetching ...

Pulsar Detection with Deep Learning

Manideep Pendyala

TL;DR

The work addresses the challenge of automated pulsar candidate classification in large radio surveys by introducing a multi-modal deep learning pipeline that fuses array-derived features with image-based diagnostics. Starting from GMRT data, the pipeline converts raw observations to filterbanks, folds across trial dispersion measures, and generates ~32,000 candidates represented by four diagnostics, which are then analyzed by a hierarchy of models. A stacked base model achieves 68% accuracy, which is improved to 87% with an enhanced CNN, and finally to 94% with a GAN-augmented CNN that balances precision and recall on a held-out test set. The results demonstrate that combining array and image channels enhances separability versus image-only approaches and that targeted generative augmentation boosts minority pulsar recall, with methods designed to be survey-agnostic and scalable to future facilities like the SKA. This approach enables near real-time triage and provides a framework for extensibility to upcoming high-throughput pulsar surveys.

Abstract

Pulsar surveys generate millions of candidates per run, overwhelming manual inspection. This thesis builds a deep learning pipeline for radio pulsar candidate selection that fuses array-derived features with image diagnostics. From approximately 500 GB of Giant Metrewave Radio Telescope (GMRT) data, raw voltages are converted to filterbanks (SIGPROC), then de-dispersed and folded across trial dispersion measures (PRESTO) to produce approximately 32,000 candidates. Each candidate yields four diagnostics--summed profile, time vs. phase, subbands vs. phase, and DM curve--represented as arrays and images. A baseline stacked model (ANNs for arrays + CNNs for images with logistic-regression fusion) reaches 68% accuracy. We then refine the CNN architecture and training (regularization, learning-rate scheduling, max-norm constraints) and mitigate class imbalance via targeted augmentation, including a GAN-based generator for the minority class. The enhanced CNN attains 87% accuracy; the final GAN+CNN system achieves 94% accuracy with balanced precision and recall on a held-out test set, while remaining lightweight enough for near--real-time triage. The results show that combining array and image channels improves separability over image-only approaches, and that modest generative augmentation substantially boosts minority (pulsar) recall. The methods are survey-agnostic and extensible to forthcoming high-throughput facilities.

Pulsar Detection with Deep Learning

TL;DR

The work addresses the challenge of automated pulsar candidate classification in large radio surveys by introducing a multi-modal deep learning pipeline that fuses array-derived features with image-based diagnostics. Starting from GMRT data, the pipeline converts raw observations to filterbanks, folds across trial dispersion measures, and generates ~32,000 candidates represented by four diagnostics, which are then analyzed by a hierarchy of models. A stacked base model achieves 68% accuracy, which is improved to 87% with an enhanced CNN, and finally to 94% with a GAN-augmented CNN that balances precision and recall on a held-out test set. The results demonstrate that combining array and image channels enhances separability versus image-only approaches and that targeted generative augmentation boosts minority pulsar recall, with methods designed to be survey-agnostic and scalable to future facilities like the SKA. This approach enables near real-time triage and provides a framework for extensibility to upcoming high-throughput pulsar surveys.

Abstract

Pulsar surveys generate millions of candidates per run, overwhelming manual inspection. This thesis builds a deep learning pipeline for radio pulsar candidate selection that fuses array-derived features with image diagnostics. From approximately 500 GB of Giant Metrewave Radio Telescope (GMRT) data, raw voltages are converted to filterbanks (SIGPROC), then de-dispersed and folded across trial dispersion measures (PRESTO) to produce approximately 32,000 candidates. Each candidate yields four diagnostics--summed profile, time vs. phase, subbands vs. phase, and DM curve--represented as arrays and images. A baseline stacked model (ANNs for arrays + CNNs for images with logistic-regression fusion) reaches 68% accuracy. We then refine the CNN architecture and training (regularization, learning-rate scheduling, max-norm constraints) and mitigate class imbalance via targeted augmentation, including a GAN-based generator for the minority class. The enhanced CNN attains 87% accuracy; the final GAN+CNN system achieves 94% accuracy with balanced precision and recall on a held-out test set, while remaining lightweight enough for near--real-time triage. The results show that combining array and image channels improves separability over image-only approaches, and that modest generative augmentation substantially boosts minority (pulsar) recall. The methods are survey-agnostic and extensible to forthcoming high-throughput facilities.

Paper Structure

This paper contains 41 sections, 4 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: A schematic view of pulsar from zhou2022
  • Figure 2: Dispersion measure vs smearing plot generated by PRESTO
  • Figure 3: Simple illustration of folding process from lynch2022
  • Figure 4: Artificial Neural Network Architecture from bre2017prediction
  • Figure 5: An Example of CNN architecture from hidaka2017consecutive
  • ...and 7 more figures