Table of Contents
Fetching ...

Pattern Mining for Anomaly Detection in Graphs: Application to Fraud in Public Procurement

Lucas Potin, Rosa Figueiredo, Vincent Labatut, Christine Largeron

TL;DR

This work proposes PANG (Pattern-Based Anomaly Detection in Graphs), a general supervised framework relying on pattern extraction to detect anomalous graphs in a collection of attributed graphs that is able to identify induced subgraphs, a type of pattern widely overlooked in the literature.

Abstract

In the context of public procurement, several indicators called red flags are used to estimate fraud risk. They are computed according to certain contract attributes and are therefore dependent on the proper filling of the contract and award notices. However, these attributes are very often missing in practice, which prohibits red flags computation. Traditional fraud detection approaches focus on tabular data only, considering each contract separately, and are therefore very sensitive to this issue. In this work, we adopt a graph-based method allowing leveraging relations between contracts, to compensate for the missing attributes. We propose PANG (Pattern-Based Anomaly Detection in Graphs), a general supervised framework relying on pattern extraction to detect anomalous graphs in a collection of attributed graphs. Notably, it is able to identify induced subgraphs, a type of pattern widely overlooked in the literature. When benchmarked on standard datasets, its predictive performance is on par with state-of-the-art methods, with the additional advantage of being explainable. These experiments also reveal that induced patterns are more discriminative on certain datasets. When applying PANG to public procurement data, the prediction is superior to other methods, and it identifies subgraph patterns that are characteristic of fraud-prone situations, thereby making it possible to better understand fraudulent behavior.

Pattern Mining for Anomaly Detection in Graphs: Application to Fraud in Public Procurement

TL;DR

This work proposes PANG (Pattern-Based Anomaly Detection in Graphs), a general supervised framework relying on pattern extraction to detect anomalous graphs in a collection of attributed graphs that is able to identify induced subgraphs, a type of pattern widely overlooked in the literature.

Abstract

In the context of public procurement, several indicators called red flags are used to estimate fraud risk. They are computed according to certain contract attributes and are therefore dependent on the proper filling of the contract and award notices. However, these attributes are very often missing in practice, which prohibits red flags computation. Traditional fraud detection approaches focus on tabular data only, considering each contract separately, and are therefore very sensitive to this issue. In this work, we adopt a graph-based method allowing leveraging relations between contracts, to compensate for the missing attributes. We propose PANG (Pattern-Based Anomaly Detection in Graphs), a general supervised framework relying on pattern extraction to detect anomalous graphs in a collection of attributed graphs. Notably, it is able to identify induced subgraphs, a type of pattern widely overlooked in the literature. When benchmarked on standard datasets, its predictive performance is on par with state-of-the-art methods, with the additional advantage of being explainable. These experiments also reveal that induced patterns are more discriminative on certain datasets. When applying PANG to public procurement data, the prediction is superior to other methods, and it identifies subgraph patterns that are characteristic of fraud-prone situations, thereby making it possible to better understand fraudulent behavior.
Paper Structure (26 sections, 6 figures, 7 tables)

This paper contains 26 sections, 6 figures, 7 tables.

Figures (6)

  • Figure 1: A collection $\mathcal{G}$ of graphs including the subsets of anomalous ($\mathcal{G}_A$) and normal ($\mathcal{G}_N$) graphs.
  • Figure 2: Three examples of general patterns present in graph $G_1$ of Figure \ref{['fig:ExDataset']}.
  • Figure 3: Processing steps of the proposed PANG framework.
  • Figure 4: Binary ($\mathbf{h}_j^b$) and integer ($\mathbf{h}_j^z$) vector-based representations of the graphs of Figure \ref{['fig:ExDataset']}, using the patterns of Figure \ref{['fig:ExPattern']} as $\mathcal{P}_s$.
  • Figure 5: (a) Distribution of the patterns in function of their discrimination scores (b) Examples of discriminative patterns.
  • ...and 1 more figures

Theorems & Definitions (7)

  • definition thmcounterdefinition: Attributed Graph
  • definition thmcounterdefinition: General Pattern
  • definition thmcounterdefinition: Induced Subgraph
  • definition thmcounterdefinition: Graph Frequency
  • definition thmcounterdefinition: Subgraph Frequency
  • definition thmcounterdefinition: Closed Pattern
  • definition thmcounterdefinition: Discrimination Score