Pulsar Detection with Deep Learning
Manideep Pendyala
TL;DR
The work addresses the challenge of automated pulsar candidate classification in large radio surveys by introducing a multi-modal deep learning pipeline that fuses array-derived features with image-based diagnostics. Starting from GMRT data, the pipeline converts raw observations to filterbanks, folds across trial dispersion measures, and generates ~32,000 candidates represented by four diagnostics, which are then analyzed by a hierarchy of models. A stacked base model achieves 68% accuracy, which is improved to 87% with an enhanced CNN, and finally to 94% with a GAN-augmented CNN that balances precision and recall on a held-out test set. The results demonstrate that combining array and image channels enhances separability versus image-only approaches and that targeted generative augmentation boosts minority pulsar recall, with methods designed to be survey-agnostic and scalable to future facilities like the SKA. This approach enables near real-time triage and provides a framework for extensibility to upcoming high-throughput pulsar surveys.
Abstract
Pulsar surveys generate millions of candidates per run, overwhelming manual inspection. This thesis builds a deep learning pipeline for radio pulsar candidate selection that fuses array-derived features with image diagnostics. From approximately 500 GB of Giant Metrewave Radio Telescope (GMRT) data, raw voltages are converted to filterbanks (SIGPROC), then de-dispersed and folded across trial dispersion measures (PRESTO) to produce approximately 32,000 candidates. Each candidate yields four diagnostics--summed profile, time vs. phase, subbands vs. phase, and DM curve--represented as arrays and images. A baseline stacked model (ANNs for arrays + CNNs for images with logistic-regression fusion) reaches 68% accuracy. We then refine the CNN architecture and training (regularization, learning-rate scheduling, max-norm constraints) and mitigate class imbalance via targeted augmentation, including a GAN-based generator for the minority class. The enhanced CNN attains 87% accuracy; the final GAN+CNN system achieves 94% accuracy with balanced precision and recall on a held-out test set, while remaining lightweight enough for near--real-time triage. The results show that combining array and image channels improves separability over image-only approaches, and that modest generative augmentation substantially boosts minority (pulsar) recall. The methods are survey-agnostic and extensible to forthcoming high-throughput facilities.
