Pattern Mining for Anomaly Detection in Graphs: Application to Fraud in Public Procurement

Lucas Potin; Rosa Figueiredo; Vincent Labatut; Christine Largeron

Pattern Mining for Anomaly Detection in Graphs: Application to Fraud in Public Procurement

Lucas Potin, Rosa Figueiredo, Vincent Labatut, Christine Largeron

TL;DR

This work proposes PANG (Pattern-Based Anomaly Detection in Graphs), a general supervised framework relying on pattern extraction to detect anomalous graphs in a collection of attributed graphs that is able to identify induced subgraphs, a type of pattern widely overlooked in the literature.

Abstract

In the context of public procurement, several indicators called red flags are used to estimate fraud risk. They are computed according to certain contract attributes and are therefore dependent on the proper filling of the contract and award notices. However, these attributes are very often missing in practice, which prohibits red flags computation. Traditional fraud detection approaches focus on tabular data only, considering each contract separately, and are therefore very sensitive to this issue. In this work, we adopt a graph-based method allowing leveraging relations between contracts, to compensate for the missing attributes. We propose PANG (Pattern-Based Anomaly Detection in Graphs), a general supervised framework relying on pattern extraction to detect anomalous graphs in a collection of attributed graphs. Notably, it is able to identify induced subgraphs, a type of pattern widely overlooked in the literature. When benchmarked on standard datasets, its predictive performance is on par with state-of-the-art methods, with the additional advantage of being explainable. These experiments also reveal that induced patterns are more discriminative on certain datasets. When applying PANG to public procurement data, the prediction is superior to other methods, and it identifies subgraph patterns that are characteristic of fraud-prone situations, thereby making it possible to better understand fraudulent behavior.

Pattern Mining for Anomaly Detection in Graphs: Application to Fraud in Public Procurement

TL;DR

Abstract

Paper Structure (26 sections, 6 figures, 7 tables)

This paper contains 26 sections, 6 figures, 7 tables.

Introduction
Related Work
Problem Formulation
PANG Framework
Description of the Framework
Step #1: Pattern Identification
Step #2: Discriminative Pattern Selection
Step # 3: Vector-Based Representation
Step #4: Classifier Training
Assessment on Benchmarks
Experimental Protocol
Experimental Results
Public Procurement Use Case
Extraction of the Graph Dataset
Raw Data
...and 11 more sections

Figures (6)

Figure 1: A collection $\mathcal{G}$ of graphs including the subsets of anomalous ($\mathcal{G}_A$) and normal ($\mathcal{G}_N$) graphs.
Figure 2: Three examples of general patterns present in graph $G_1$ of Figure \ref{['fig:ExDataset']}.
Figure 3: Processing steps of the proposed PANG framework.
Figure 4: Binary ($\mathbf{h}_j^b$) and integer ($\mathbf{h}_j^z$) vector-based representations of the graphs of Figure \ref{['fig:ExDataset']}, using the patterns of Figure \ref{['fig:ExPattern']} as $\mathcal{P}_s$.
Figure 5: (a) Distribution of the patterns in function of their discrimination scores (b) Examples of discriminative patterns.
...and 1 more figures

Theorems & Definitions (7)

definition thmcounterdefinition: Attributed Graph
definition thmcounterdefinition: General Pattern
definition thmcounterdefinition: Induced Subgraph
definition thmcounterdefinition: Graph Frequency
definition thmcounterdefinition: Subgraph Frequency
definition thmcounterdefinition: Closed Pattern
definition thmcounterdefinition: Discrimination Score

Pattern Mining for Anomaly Detection in Graphs: Application to Fraud in Public Procurement

TL;DR

Abstract

Pattern Mining for Anomaly Detection in Graphs: Application to Fraud in Public Procurement

Authors

TL;DR

Abstract

Table of Contents

Figures (6)

Theorems & Definitions (7)