A review of unsupervised learning in astronomy

Sotiria Fotopoulou

A review of unsupervised learning in astronomy

Sotiria Fotopoulou

TL;DR

This review synthesises how unsupervised learning has evolved in astronomy, highlighting core methods such as PCA/SVD, ICA, NMF, Isomap, LLE, tSNE, and UMAP, as well as clustering techniques like k‑means, GMMs, and DBSCAN/HDBSCAN. It emphasizes the shift from purely linear, dimensionality‑reduction approaches to nonlinear manifolds, neural network–based representations, and modern self‑supervised and domain‑adaptation strategies. The authors stress practical workflow considerations, data peculiarities (missing data, heterogeneity), and the importance of robust validation, benchmarks, and interpretability in data‑driven discovery. Overall, the paper argues for thoughtful integration of ML with domain knowledge to enable scalable, generalizable insights while avoiding overinterpretation in the face of complex, high‑dimensional astronomical data.

Abstract

This review summarizes popular unsupervised learning methods, and gives an overview of their past, current, and future uses in astronomy. Unsupervised learning aims to organise the information content of a dataset, in such a way that knowledge can be extracted. Traditionally this has been achieved through dimensionality reduction techniques that aid the ranking of a dataset, for example through principal component analysis or by using auto-encoders, or simpler visualisation of a high dimensional space, for example through the use of a self organising map. Other desirable properties of unsupervised learning include the identification of clusters, i.e. groups of similar objects, which has traditionally been achieved by the k-means algorithm and more recently through density-based clustering such as HDBSCAN. More recently, complex frameworks have emerged, that chain together dimensionality reduction and clustering methods. However, no dataset is fully unknown. Thus, nowadays a lot of research has been directed towards self-supervised and semi-supervised methods that stand to gain from both supervised and unsupervised learning.

A review of unsupervised learning in astronomy

TL;DR

Abstract

A review of unsupervised learning in astronomy

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)