Applied Machine Learning to Anomaly Detection in Enterprise Purchase Processes
A. Herreros-Martínez, R. Magdalena-Benedicto, J. Vila-Francés, A. J. Serrano-López, S. Pérez-Díaz
TL;DR
The paper tackles anomaly detection in enterprise purchase processes under unlabeled data by combining univariate (z-score, DBSCAN) and multivariate (k-Means with categorical encodings, Isolation Forest) techniques. It proposes an ensemble prioritisation to rank anomalous transactions and integrates explainability (SHAP/LIME) to aid auditors. Results show univariate methods yield manageable candidate sets, while k-Means clustering often exhibits weak structure (SSE/Silhouette) under tested configurations, with Isolation Forest providing complementary signals. The approach is implemented in KNIME as a reproducible workflow, offering practical value for automated auditing and pointing to future work on richer encodings and additional clustering methods for stronger anomaly characterization.
Abstract
In a context of a continuous digitalisation of processes, organisations must deal with the challenge of detecting anomalies that can reveal suspicious activities upon an increasing volume of data. To pursue this goal, audit engagements are carried out regularly, and internal auditors and purchase specialists are constantly looking for new methods to automate these processes. This work proposes a methodology to prioritise the investigation of the cases detected in two large purchase datasets from real data. The goal is to contribute to the effectiveness of the companies' control efforts and to increase the performance of carrying out such tasks. A comprehensive Exploratory Data Analysis is carried out before using unsupervised Machine Learning techniques addressed to detect anomalies. A univariate approach has been applied through the z-Score index and the DBSCAN algorithm, while a multivariate analysis is implemented with the k-Means and Isolation Forest algorithms, and the Silhouette index, resulting in each method having a transaction candidates' proposal to be reviewed. An ensemble prioritisation of the candidates is provided jointly with a proposal of explicability methods (LIME, Shapley, SHAP) to help the company specialists in their understanding.
