Table of Contents
Fetching ...

Deep Orthogonal Hypersphere Compression for Anomaly Detection

Yunhe Zhang, Yan Sun, Jinyu Cai, Jicong Fan

TL;DR

This work tackles the key challenge of anomaly detection in high-dimensional spaces where a single hypersphere boundary is difficult to learn and remains poorly compact due to the soap-bubble phenomenon. It introduces two end-to-end methods: DOHSC, which uses an orthogonal projection layer to enforce near-hyperspherical, compact normal regions, and DO2HSC, which constrains normal data to a bi-hypersphere shell to alleviate incompactness. The authors also extend both methods to graph-level anomaly detection by incorporating mutual information maximization between local and global graph representations via GIN, forming a robust, scalable framework for diverse data modalities. Comprehensive experiments on image, tabular, and graph datasets show that DOHSC and especially DO2HSC achieve state-of-the-art performance, with improved boundary compactness and resilience to high-dimensional effects. The work includes theoretical insights, ablations, and practical considerations, and code is made available for reproducibility.

Abstract

Many well-known and effective anomaly detection methods assume that a reasonable decision boundary has a hypersphere shape, which however is difficult to obtain in practice and is not sufficiently compact, especially when the data are in high-dimensional spaces. In this paper, we first propose a novel deep anomaly detection model that improves the original hypersphere learning through an orthogonal projection layer, which ensures that the training data distribution is consistent with the hypersphere hypothesis, thereby increasing the true positive rate and decreasing the false negative rate. Moreover, we propose a bi-hypersphere compression method to obtain a hyperspherical shell that yields a more compact decision region than a hyperball, which is demonstrated theoretically and numerically. The proposed methods are not confined to common datasets such as image and tabular data, but are also extended to a more challenging but promising scenario, graph-level anomaly detection, which learns graph representation with maximum mutual information between the substructure and global structure features while exploring orthogonal single- or bi-hypersphere anomaly decision boundaries. The numerical and visualization results on benchmark datasets demonstrate the superiority of our methods in comparison to many baselines and state-of-the-art methods.

Deep Orthogonal Hypersphere Compression for Anomaly Detection

TL;DR

This work tackles the key challenge of anomaly detection in high-dimensional spaces where a single hypersphere boundary is difficult to learn and remains poorly compact due to the soap-bubble phenomenon. It introduces two end-to-end methods: DOHSC, which uses an orthogonal projection layer to enforce near-hyperspherical, compact normal regions, and DO2HSC, which constrains normal data to a bi-hypersphere shell to alleviate incompactness. The authors also extend both methods to graph-level anomaly detection by incorporating mutual information maximization between local and global graph representations via GIN, forming a robust, scalable framework for diverse data modalities. Comprehensive experiments on image, tabular, and graph datasets show that DOHSC and especially DO2HSC achieve state-of-the-art performance, with improved boundary compactness and resilience to high-dimensional effects. The work includes theoretical insights, ablations, and practical considerations, and code is made available for reproducibility.

Abstract

Many well-known and effective anomaly detection methods assume that a reasonable decision boundary has a hypersphere shape, which however is difficult to obtain in practice and is not sufficiently compact, especially when the data are in high-dimensional spaces. In this paper, we first propose a novel deep anomaly detection model that improves the original hypersphere learning through an orthogonal projection layer, which ensures that the training data distribution is consistent with the hypersphere hypothesis, thereby increasing the true positive rate and decreasing the false negative rate. Moreover, we propose a bi-hypersphere compression method to obtain a hyperspherical shell that yields a more compact decision region than a hyperball, which is demonstrated theoretically and numerically. The proposed methods are not confined to common datasets such as image and tabular data, but are also extended to a more challenging but promising scenario, graph-level anomaly detection, which learns graph representation with maximum mutual information between the substructure and global structure features while exploring orthogonal single- or bi-hypersphere anomaly decision boundaries. The numerical and visualization results on benchmark datasets demonstrate the superiority of our methods in comparison to many baselines and state-of-the-art methods.
Paper Structure (31 sections, 3 theorems, 23 equations, 17 figures, 13 tables, 3 algorithms)

This paper contains 31 sections, 3 theorems, 23 equations, 17 figures, 13 tables, 3 algorithms.

Key Result

Proposition 1

Suppose $\mathbf{z}_1, \mathbf{z}_2,\cdots, \mathbf{z}_n$ are sampled from $\mathcal{N}(\mathbf{0}, \mathbf{I}_d)$ independently. Then, for any $\mathbf{z}_i$ and all $t \ge 0$, the following inequality holds.

Figures (17)

  • Figure 1: Architecture of the proposed models (right top: DOHSC; right bottom: DO2HSC). Herein, 2-D visualizations show the trends of training data when applying two optimizations and 3-D visualizations illustrate the detection results obtained by them, respectively.
  • Figure 2: Toy example of decision boundaries with and without the orthogonal projection layer. Blue circle: assumed decision boundary; black ellipse: actual decision boundary; purple points: normal data; red points: abnormal data.
  • Figure 3: Soap-bubble phenomenon showed by the histogram of distances from the center of $10^4$ samples drawn from $\mathcal{N}(\mathbf{0}, \mathbf{I}_d)$. In high-dimensional space, almost all data are far from the center.
  • Figure 4: Illustration of inevitable flaws in DOHSC on both the training and testing data of COX2. Left: the $\ell_2$-norm distribution of 4-dimensional distances learned from the real dataset; Right: the pseudo-layout in two-dimensional space sketched by reference to the empirical distribution.
  • Figure 5: Distance Histograms on ER$\_$MD.
  • ...and 12 more figures

Theorems & Definitions (8)

  • Proposition 1
  • Proposition 2
  • Example 1
  • Example 2
  • Proposition 3
  • Example 3
  • proof
  • proof