Table of Contents
Fetching ...

FedAD-Bench: A Unified Benchmark for Federated Unsupervised Anomaly Detection in Tabular Data

Ahmed Anwar, Brian Moser, Dayananda Herurkar, Federico Raue, Vinit Hegiste, Tatjana Legler, Andreas Dengel

TL;DR

FedADBench is introduced, a unified benchmark for evaluating unsupervised anomaly detection algorithms within the context of FL, and insights are presented into FL's regularization effects, revealing scenarios in which it outperforms centralized approaches due to its inherent ability to mitigate overfitting.

Abstract

The emergence of federated learning (FL) presents a promising approach to leverage decentralized data while preserving privacy. Furthermore, the combination of FL and anomaly detection is particularly compelling because it allows for detecting rare and critical anomalies (usually also rare in locally gathered data) in sensitive data from multiple sources, such as cybersecurity and healthcare. However, benchmarking the performance of anomaly detection methods in FL environments remains an underexplored area. This paper introduces FedAD-Bench, a unified benchmark for evaluating unsupervised anomaly detection algorithms within the context of FL. We systematically analyze and compare the performance of recent deep learning anomaly detection models under federated settings, which were typically assessed solely in centralized settings. FedAD-Bench encompasses diverse datasets and metrics to provide a holistic evaluation. Through extensive experiments, we identify key challenges such as model aggregation inefficiencies and metric unreliability. We present insights into FL's regularization effects, revealing scenarios in which it outperforms centralized approaches due to its inherent ability to mitigate overfitting. Our work aims to establish a standardized benchmark to guide future research and development in federated anomaly detection, promoting reproducibility and fair comparison across studies.

FedAD-Bench: A Unified Benchmark for Federated Unsupervised Anomaly Detection in Tabular Data

TL;DR

FedADBench is introduced, a unified benchmark for evaluating unsupervised anomaly detection algorithms within the context of FL, and insights are presented into FL's regularization effects, revealing scenarios in which it outperforms centralized approaches due to its inherent ability to mitigate overfitting.

Abstract

The emergence of federated learning (FL) presents a promising approach to leverage decentralized data while preserving privacy. Furthermore, the combination of FL and anomaly detection is particularly compelling because it allows for detecting rare and critical anomalies (usually also rare in locally gathered data) in sensitive data from multiple sources, such as cybersecurity and healthcare. However, benchmarking the performance of anomaly detection methods in FL environments remains an underexplored area. This paper introduces FedAD-Bench, a unified benchmark for evaluating unsupervised anomaly detection algorithms within the context of FL. We systematically analyze and compare the performance of recent deep learning anomaly detection models under federated settings, which were typically assessed solely in centralized settings. FedAD-Bench encompasses diverse datasets and metrics to provide a holistic evaluation. Through extensive experiments, we identify key challenges such as model aggregation inefficiencies and metric unreliability. We present insights into FL's regularization effects, revealing scenarios in which it outperforms centralized approaches due to its inherent ability to mitigate overfitting. Our work aims to establish a standardized benchmark to guide future research and development in federated anomaly detection, promoting reproducibility and fair comparison across studies.
Paper Structure (18 sections, 1 equation, 1 figure, 4 tables)

This paper contains 18 sections, 1 equation, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Unsupervised anomaly detection setup in both centralized and FL scenarios. Training is done on 50% of the normal data while the test set contains all anomalies as well as the other half of the normal data. Each client trains their local model on their own set of inliers. Afterward, the trained models are aggregated at the server using FedAvg and evaluated on the test set. The inference is based on the reconstruction error, where samples with high error values are considered anomalies.