Table of Contents
Fetching ...

Enhancing anomaly detection with topology-aware autoencoders

Vishal S. Ngairangbam, Błażej Rozwoda, Kazuki Sakurai, Michael Spannowsky

TL;DR

The paper tackles anomaly detection in collider data by addressing a fundamental limitation: standard autoencoders with Euclidean latent spaces struggle to faithfully represent non-trivial momentum-space manifolds. It introduces topology-aware autoencoders that embed phase-space distributions onto compact manifolds, notably $S^n$, $S^n \otimes S^m$, and $\mathbb{RP}^2$, and provides explicit constructions to realize these topologies in the latent space. Through toy experiments and a realistic hadronic top-quark decay scenario, the authors show that matching latent-space topology to the data manifold preserves global structure and improves anomaly separation, with four-dimensional non-trivial topologies delivering the best performance in many cases. This work establishes a principled framework for incorporating physical priors into unsupervised learning for robust, topology-consistent anomaly detection in high-energy physics data.

Abstract

Anomaly detection in high-energy physics is essential for identifying new physics beyond the Standard Model. Autoencoders provide a signal-agnostic approach but are limited by the topology of their latent space. This work explores topology-aware autoencoders, embedding phase-space distributions onto compact manifolds that reflect energy-momentum conservation. We construct autoencoders with spherical ($S^n$), product ($S^2 \otimes S^2$), and projective ($\mathbb{RP}^2$) latent spaces and compare their anomaly detection performance against conventional Euclidean embeddings. Our results show that autoencoders with topological priors significantly improve anomaly separation by preserving the global structure of the data manifold and reducing spurious reconstruction errors. Applying our approach to simulated hadronic top-quark decays, we show that latent spaces with appropriate topological constraints enhance sensitivity and robustness in detecting anomalous events. This study establishes topology-aware autoencoders as a powerful tool for unsupervised searches for new physics in particle-collision data.

Enhancing anomaly detection with topology-aware autoencoders

TL;DR

The paper tackles anomaly detection in collider data by addressing a fundamental limitation: standard autoencoders with Euclidean latent spaces struggle to faithfully represent non-trivial momentum-space manifolds. It introduces topology-aware autoencoders that embed phase-space distributions onto compact manifolds, notably , , and , and provides explicit constructions to realize these topologies in the latent space. Through toy experiments and a realistic hadronic top-quark decay scenario, the authors show that matching latent-space topology to the data manifold preserves global structure and improves anomaly separation, with four-dimensional non-trivial topologies delivering the best performance in many cases. This work establishes a principled framework for incorporating physical priors into unsupervised learning for robust, topology-consistent anomaly detection in high-energy physics data.

Abstract

Anomaly detection in high-energy physics is essential for identifying new physics beyond the Standard Model. Autoencoders provide a signal-agnostic approach but are limited by the topology of their latent space. This work explores topology-aware autoencoders, embedding phase-space distributions onto compact manifolds that reflect energy-momentum conservation. We construct autoencoders with spherical (), product (), and projective () latent spaces and compare their anomaly detection performance against conventional Euclidean embeddings. Our results show that autoencoders with topological priors significantly improve anomaly separation by preserving the global structure of the data manifold and reducing spurious reconstruction errors. Applying our approach to simulated hadronic top-quark decays, we show that latent spaces with appropriate topological constraints enhance sensitivity and robustness in detecting anomalous events. This study establishes topology-aware autoencoders as a powerful tool for unsupervised searches for new physics in particle-collision data.

Paper Structure

This paper contains 13 sections, 1 equation, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Successful and unsuccessful global fits of data with different latent manifolds.
  • Figure 2: Input--output pairs for ten samples with highest reconstruction error for different latent spaces. The dataset is $S^2$ embedded in $\mathbb{R}^3$. The 3D data is projected onto three different 2D planes.
  • Figure 3: Loss-versus-distance plots for three latent layers. For each sample $\mathbf{r}$, the distance $|\mathbf{r}-\mathbf{r}_0|$ is measured from the sample $\mathbf{r}_0$ with the highest loss. Topological anomaly is visible as a peak on the plot for $\mathbb{R}^2$ latent space. Two million data points were used to properly visualise topological anomalies.
  • Figure 4: Input - output pairs for ten samples with highest reconstruction error for different latent spaces. The dataset is $S^2\otimes S^2$ embedded in $\mathbb{R}^9$. $9D$ data is projected orthogonally onto selected 2-axes combinations.
  • Figure 5: Invariant mass distribution of the $W^+$$(m_{jj})$ and top $(m_{bjj})$ decay products after reconstruction and baseline selection (solid lines) and the true distribution at parton level (dotted lines).