Multi-Aspect Mining and Anomaly Detection for Heterogeneous Tensor Streams
Soshi Kakio, Yasuko Matsubara, Ren Fujiwara, Yasushi Sakurai
TL;DR
HeteroComp addresses the challenge of online analysis of heterogeneous tensor streams containing both categorical and continuous attributes by introducing a components-based model that jointly captures latent groups and their temporal dynamics using Gaussian process priors. It avoids discretizing continuous attributes or timestamps, instead modeling their distributions nonparametrically via a logistic Gaussian process and GP-driven dynamics, enabling continuous density estimation and interpretable summaries. The framework supports incremental updates through collapsed Gibbs sampling and efficient GP approximations, and detects group anomalies with a chi-squared goodness-of-fit score across components and attributes. Empirical results on real datasets show superior group-anomaly detection accuracy and linear-time scalability, illustrating the method’s practical usefulness for cybersecurity, ecommerce analytics, and other multi-aspect streaming domains.
Abstract
Analysis and anomaly detection in event tensor streams consisting of timestamps and multiple attributes - such as communication logs(time, IP address, packet length)- are essential tasks in data mining. While existing tensor decomposition and anomaly detection methods provide useful insights, they face the following two limitations. (i) They cannot handle heterogeneous tensor streams, which comprises both categorical attributes(e.g., IP address) and continuous attributes(e.g., packet length). They typically require either discretizing continuous attributes or treating categorical attributes as continuous, both of which distort the underlying statistical properties of the data.Furthermore, incorrect assumptions about the distribution family of continuous attributes often degrade the model's performance. (ii) They discretize timestamps, failing to track the temporal dynamics of streams(e.g., trends, abnormal events), which makes them ineffective for detecting anomalies at the group level, referred to as 'group anomalies' (e.g, DoS attacks). To address these challenges, we propose HeteroComp, a method for continuously summarizing heterogeneous tensor streams into 'components' representing latent groups in each attribute and their temporal dynamics, and detecting group anomalies. Our method employs Gaussian process priors to model unknown distributions of continuous attributes, and temporal dynamics, which directly estimate probability densities from data. Extracted components give concise but effective summarization, enabling accurate group anomaly detection. Extensive experiments on real datasets demonstrate that HeteroComp outperforms the state-of-the-art algorithms for group anomaly detection accuracy, and its computational time does not depend on the data stream length.
