Table of Contents
Fetching ...

Revisiting Non-separable Binary Classification and its Applications in Anomaly Detection

Matthew Lau, Ismaila Seck, Athanasios P Meliopoulos, Wenke Lee, Eugene Ndiaye

TL;DR

The paper reconceptualizes a classic nonlinearly separable problem (XOR) through equality separation, a linear rule that classifies by distance to a learned hyperplane and can be integrated into neural networks via smooth bump activations. It proves that equality separators have twice the VC dimension of standard halfspaces (exactly $2n+1$ for strict separators and between $2n+1$ and $2n+3$ for $\epsilon$-error variants), and introduces closing numbers to quantify the capacity to form closed decision regions, linking locality and anomaly detection. The authors connect equality separation to existing non-linear classifiers (hyper-ridge/hyper-hill/OVS) and show that with appropriate inductive bias it yields robust detection of both seen and unseen anomalies, supported by toy and real-world experiments across cyber-security, medical, and industrial datasets. The framework provides a principled way to balance learning and robust AD via margin-focused containment of normal data, with practical implications for designing neural nets and foundation-model pipelines capable of reliable anomaly detection in high-dimensional spaces.

Abstract

The inability to linearly classify XOR has motivated much of deep learning. We revisit this age-old problem and show that linear classification of XOR is indeed possible. Instead of separating data between halfspaces, we propose a slightly different paradigm, equality separation, that adapts the SVM objective to distinguish data within or outside the margin. Our classifier can then be integrated into neural network pipelines with a smooth approximation. From its properties, we intuit that equality separation is suitable for anomaly detection. To formalize this notion, we introduce closing numbers, a quantitative measure on the capacity for classifiers to form closed decision regions for anomaly detection. Springboarding from this theoretical connection between binary classification and anomaly detection, we test our hypothesis on supervised anomaly detection experiments, showing that equality separation can detect both seen and unseen anomalies.

Revisiting Non-separable Binary Classification and its Applications in Anomaly Detection

TL;DR

The paper reconceptualizes a classic nonlinearly separable problem (XOR) through equality separation, a linear rule that classifies by distance to a learned hyperplane and can be integrated into neural networks via smooth bump activations. It proves that equality separators have twice the VC dimension of standard halfspaces (exactly for strict separators and between and for -error variants), and introduces closing numbers to quantify the capacity to form closed decision regions, linking locality and anomaly detection. The authors connect equality separation to existing non-linear classifiers (hyper-ridge/hyper-hill/OVS) and show that with appropriate inductive bias it yields robust detection of both seen and unseen anomalies, supported by toy and real-world experiments across cyber-security, medical, and industrial datasets. The framework provides a principled way to balance learning and robust AD via margin-focused containment of normal data, with practical implications for designing neural nets and foundation-model pipelines capable of reliable anomaly detection in high-dimensional spaces.

Abstract

The inability to linearly classify XOR has motivated much of deep learning. We revisit this age-old problem and show that linear classification of XOR is indeed possible. Instead of separating data between halfspaces, we propose a slightly different paradigm, equality separation, that adapts the SVM objective to distinguish data within or outside the margin. Our classifier can then be integrated into neural network pipelines with a smooth approximation. From its properties, we intuit that equality separation is suitable for anomaly detection. To formalize this notion, we introduce closing numbers, a quantitative measure on the capacity for classifiers to form closed decision regions for anomaly detection. Springboarding from this theoretical connection between binary classification and anomaly detection, we test our hypothesis on supervised anomaly detection experiments, showing that equality separation can detect both seen and unseen anomalies.
Paper Structure (87 sections, 14 theorems, 25 equations, 11 figures, 16 tables)

This paper contains 87 sections, 14 theorems, 25 equations, 11 figures, 16 tables.

Key Result

Theorem 2.3

For hypothesis class of strict equality separators $\mathcal{H}$ in Def. def:strict_equality_separator, $\mathrm{VCdim}(\mathcal{H}) = 2n+1$.

Figures (11)

  • Figure 1: Linear classification by halfspace separators and equality separators for logical functions (Figures \ref{['fig:AND']}-\ref{['fig:XOR']}) and with 1 hidden layer with hidden units $h_1,h_2$ (Figure \ref{['fig:XOR_NN']}). In general, equality separation can classify linearly separable (Figure \ref{['fig:gaussian_equality_separation']}) and non-separable (Figure \ref{['fig:inseparable_equality_separation']}) data.
  • Figure 2: Normal data occupies a space with non-zero, finite volume. Classifiers can form closed decision regions to capture this.
  • Figure 3: Sample heatmap predictions by different models. The middle circle is the positive class, while the outer and inner circle is the negative class during training and testing respectively. Figures \ref{['fig:AD_heatmap_HS2b']}, \ref{['fig:AD_heatmap_ES2r']} and \ref{['fig:AD_heatmap_RBF2r']} are deep models, with hidden layer activations representated by the suffix 'b' for bump and 'r' for RBF. Shallow equality separator (Figure \ref{['fig:AD_heatmap_ES_RBF']}) has a one-class decision boundary closest to ground truth, followed by equality separator neural networks with RBF activation (Figure \ref{['fig:AD_heatmap_ES2r']}).
  • Figure 4: Proof overview flow chart of Lemma \ref{['lemma:VC_general_lesser']}.
  • Figure 5: Two different bump activations: the Gaussian bump which we used in blue, and the hyperbolic tangent bump in orange.
  • ...and 6 more figures

Theorems & Definitions (23)

  • Definition 2.1
  • Definition 2.2
  • Theorem 2.3
  • Corollary 2.4
  • Definition 3.1
  • Theorem 3.2
  • Theorem B.1
  • Lemma B.2
  • proof
  • Lemma B.3
  • ...and 13 more