Table of Contents
Fetching ...

Trivial Graph Features and Classical Learning are Enough to Detect Random Anomalies

Matthieu Latapy, Stephany Rajeh

TL;DR

It is shown here that trivial graph features and classical learning techniques are sufficient to detect anomalies extremely well and this basic approach has very low computational costs and it leads to easily interpretable results.

Abstract

Detecting anomalies in link streams that represent various kinds of interactions is an important research topic with crucial applications. Because of the lack of ground truth data, proposed methods are mostly evaluated through their ability to detect randomly injected links. In contrast with most proposed methods, that rely on complex approaches raising computational and/or interpretability issues, we show here that trivial graph features and classical learning techniques are sufficient to detect such anomalies extremely well. This basic approach has very low computational costs and it leads to easily interpretable results. It also has many other desirable properties that we study through an extensive set of experiments. We conclude that detection methods should now target more complex kinds of anomalies.

Trivial Graph Features and Classical Learning are Enough to Detect Random Anomalies

TL;DR

It is shown here that trivial graph features and classical learning techniques are sufficient to detect anomalies extremely well and this basic approach has very low computational costs and it leads to easily interpretable results.

Abstract

Detecting anomalies in link streams that represent various kinds of interactions is an important research topic with crucial applications. Because of the lack of ground truth data, proposed methods are mostly evaluated through their ability to detect randomly injected links. In contrast with most proposed methods, that rely on complex approaches raising computational and/or interpretability issues, we show here that trivial graph features and classical learning techniques are sufficient to detect such anomalies extremely well. This basic approach has very low computational costs and it leads to easily interpretable results. It also has many other desirable properties that we study through an extensive set of experiments. We conclude that detection methods should now target more complex kinds of anomalies.
Paper Structure (16 sections, 8 figures, 4 tables, 1 algorithm)

This paper contains 16 sections, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: Examples of $G$-type and $H$-type history graphs. Top: a link stream between 4 nodes $a$, $b$, $c$, and $d$, from time $0$ to $10$. We consider the latest link $(10,b,c)$, meaning that an interaction occurred between $b$ and $c$ at time $10$. We display two $G$-type ($G_3$ and $G_8$, bottom-left) and two $H$-type ($H_3$ and $H_8$, bottom-right) history graphs for this link. The integers on the links of these graphs indicate their number of occurrences within the considered history. For instance, $H_3$ is the graph obtained from the $3$ last interactions. They involve $c$ and $d$ twice and $a$ and $b$ once.
  • Figure 2: The impact of size $s$ and duration $d$ (horizontal axis) of the $H$-type (left) and $G$-type (right) history graphs on AUC scores. We consider here $5$% anomaly injection and learning with $r=0.7$.
  • Figure 3: AUC scores obtained with various history resolutions and combinations. For each dataset, we display the best score obtained with, from left to right: a $H$-type history graph, a $G$-type history graph, the combination of all $H$-type history graphs, the combination of all $G$-type history graphs, and the combination of all these history graphs. We consider 5% anomaly injection and learning with $r=0.7$.
  • Figure 4: AUC scores for TGF with sliding windows containing $50\%$ of all links, using $H$-type history graphs of size $1000$, with $5$% anomaly injection and learning rate $r=0.7$ in each window. The inset shows results for sliding windows containing only $1\%$ of all links in the largest datasets.
  • Figure 5: The impact of size $s$ and duration $d$ (horizontal axis) of the $H$-type (top) and $G$-type (bottom) history graphs on AUC scores in the Digg dataset. We also show the impact of the usage of different machine learning algorithms on $H$-type history graphs. We consider here $5$% anomaly injection and learning with $r=0.7$.
  • ...and 3 more figures