Improving Intrusion Detection with Domain-Invariant Representation Learning in Latent Space
Padmaksha Roy, Tyler Cody, Himanshu Singhal, Kevin Choi, Ming Jin
TL;DR
This work tackles zero-day anomaly detection under domain shift by learning a domain-invariant latent representation $Z$ through a multi-task latent-space encoder-decoder framework (MTLS-RED). It combines classification, reconstruction, and a matrix-based mutual information regularization derived from the Principle of Relevant Information to decorrelate spurious domain-specific features, promoting invariance across domains. The model is trained on multiple source and cross-domain datasets with varying correlation structures, achieving notable improvements in average precision, recall, and AUC-ROC for unseen OOD classes. The approach offers practical significance for robust intrusion detection in real-world, heterogeneous environments, with a principled mechanism to balance information preservation and compression in the latent space.
Abstract
Zero-day anomaly detection is critical in industrial applications where novel, unforeseen threats can compromise system integrity and safety. Traditional detection systems often fail to identify these unseen anomalies due to their reliance on in-distribution data. Domain generalization addresses this gap by leveraging knowledge from multiple known domains to detect out-of-distribution events. In this work, we introduce a multi-task representation learning technique that fuses information across related domains into a unified latent space. By jointly optimizing classification, reconstruction, and mutual information regularization losses, our method learns a minimal(bottleneck), domain-invariant representation that discards spurious correlations. This latent space decorrelation enhances generalization, enabling the detection of anomalies in unseen domains. Our experimental results demonstrate significant improvements in zero-day or novel anomaly detection across diverse anomaly detection datasets.
