astromorph: Self-supervised machine learning pipeline for astronomical morphology analysis
Per Bjerkeli, Jouni Kainulainen, Maria Carmen Toribio, Leon Boschman, Otoniel Maya Lucas
TL;DR
Astromorph addresses the need to organize and interpret large, imaging-rich astronomical datasets without labeled data by implementing a self-supervised BYOL-based pipeline tailored for astronomy. It integrates a BYOL framework into a user-friendly package that supports variable data dimensions, including single-channel FITS images and multi-channel spectral cubes, and provides a lightweight CNN for accessible training. The paper demonstrates two science cases—ALMA protoplanetary disks and infrared dark clouds from Spitzer/Herschel—showing that the learned embeddings capture morphology and enable clustering, similarity search, and exploratory analysis. The approach offers a practical, scalable tool for discovery in observational astronomy and is poised to extend to JWST data and 3D data cubes.
Abstract
Modern telescopes generate increasingly large and diverse datasets, often consisting of complex and morphologically rich structures. To efficiently explore such data requires automated methods that can extract and organize physically meaningful information, ideally without the need for extensive manual interaction. We aim to provide a user-friendly implementation of a self-supervised machine learning framework to explore morphological properties of large datasets, based on the BYOL (Bootstrap Your Own Latents) method. By enabling the generation of meaningful image embeddings without manually labelled data, the framework will enable key tasks such as clustering, anomaly detection, and similarity based exploration. In contrast to existing BYOL implementations, astromorph accommodates data of varying dimensions and resolutions, including both single-channel FITS images and multi-channel spectral cubes. The package is built with usability in mind, offering streamlined pipeline scripts for ease of use as well as deeper customization options via PyTorch-based classes. To demonstrate the utility of astromorph, we apply it in two contrasting science cases representing different astronomical domains: images of protoplanetary disks observed with ALMA, and infrared dark clouds observed with Spitzer and Herschel. In both cases, we demonstrate how astromorph produces scientifically meaningful embeddings that capture morphological differences and similarities across large samples. astromorph enables users to apply a robust, label-free approach for uncovering morphological patterns in astronomical datasets. The successful application to two markedly different datasets suggest that the pipeline is broadly applicable across a wide range of imaging-rich astronomical context, providing a user friendly tool for advancing discovery in observational astronomy.
