Spherinator and HiPSter: Representation Learning for Unbiased Knowledge Discovery from Simulations
Kai L. Polsterer, Bernd Doser, Andreas Fehlner, Sebastian Trujillo-Gomez
TL;DR
The paper tackles the challenge of interpreting petabyte-scale, high-dimensional simulation outputs by introducing Spherinator, a hyperspherical variational autoencoder that maps data to a spherical latent space with coordinates $\mu$ (unit norm) and $\kappa$ (concentration), optimized via $L = L_{recon} + \lambda L_{KL}$ and guided by a power-spherical prior $p_{X}(x; \mu, \kappa)$. It then proposes HiPSter to store and visualize these embeddings as a multi-resolution HiPS tiling on the sphere, enabling scalable, interactive exploration with Aladin Lite. The prototype, trained on IllustrisTNG galaxies, yields a natural Hubble tuning fork–like similarity space and demonstrates practical benefits for findability, sampling, and outlier detection, with applicability to observed data in the Exascale era. Together, Spherinator and HiPSter provide a scalable, unbiased workflow for discovery from simulations and broader data ecosystems.
Abstract
Simulations are the best approximation to experimental laboratories in astrophysics and cosmology. However, the complexity, richness, and large size of their outputs severely limit the interpretability of their predictions. We describe a new, unbiased, and machine learning based approach to obtaining useful scientific insights from a broad range of simulations. The method can be used on today's largest simulations and will be essential to solve the extreme data exploration and analysis challenges posed by the Exascale era. Furthermore, this concept is so flexible, that it will also enable explorative access to observed data. Our concept is based on applying nonlinear dimensionality reduction to learn compact representations of the data in a low-dimensional space. The simulation data is projected onto this space for interactive inspection, visual interpretation, sample selection, and local analysis. We present a prototype using a rotational invariant hyperspherical variational convolutional autoencoder, utilizing a power distribution in the latent space, and trained on galaxies from IllustrisTNG simulation. Thereby, we obtain a natural Hubble tuning fork like similarity space that can be visualized interactively on the surface of a sphere by exploiting the power of HiPS tilings in Aladin Lite.
