Federated Learning for Misbehaviour Detection with Variational Autoencoders and Gaussian Mixture Models

Enrique Mármol Campos; Aurora González Vidal; José Luis Hernández Ramos; Antonio Skarmeta

Federated Learning for Misbehaviour Detection with Variational Autoencoders and Gaussian Mixture Models

Enrique Mármol Campos, Aurora González Vidal, José Luis Hernández Ramos, Antonio Skarmeta

TL;DR

This work tackles misbehavior detection in vehicular networks under privacy constraints by proposing an unsupervised Federated Learning pipeline that combines Gaussian Mixture Models for probabilistic clustering with Variational Autoencoders for reconstruction-based anomaly detection. RBM pretraining is used to improve VAE convergence in non-iid FL, and Fed+ aggregation mitigates non-identical updates across clients. Evaluated on VeReMi and its extensions, the approach achieves high recall and competitive overall accuracy, approaching supervised baselines while avoiding labeled data requirements. The cloud-based aggregator enables cross-vehicle defense and scalable, privacy-preserving learning across a growing fleet of vehicles, with potential extensions to multi-class misbehavior and dynamic training strategies.

Abstract

Federated Learning (FL) has become an attractive approach to collaboratively train Machine Learning (ML) models while data sources' privacy is still preserved. However, most of existing FL approaches are based on supervised techniques, which could require resource-intensive activities and human intervention to obtain labelled datasets. Furthermore, in the scope of cyberattack detection, such techniques are not able to identify previously unknown threats. In this direction, this work proposes a novel unsupervised FL approach for the identification of potential misbehavior in vehicular environments. We leverage the computing capabilities of public cloud services for model aggregation purposes, and also as a central repository of misbehavior events, enabling cross-vehicle learning and collective defense strategies. Our solution integrates the use of Gaussian Mixture Models (GMM) and Variational Autoencoders (VAE) on the VeReMi dataset in a federated environment, where each vehicle is intended to train only with its own data. Furthermore, we use Restricted Boltzmann Machines (RBM) for pre-training purposes, and Fedplus as aggregation function to enhance model's convergence. Our approach provides better performance (more than 80 percent) compared to recent proposals, which are usually based on supervised techniques and artificial divisions of the VeReMi dataset.

Federated Learning for Misbehaviour Detection with Variational Autoencoders and Gaussian Mixture Models

TL;DR

Abstract

Paper Structure (17 sections, 8 equations, 11 figures, 3 tables, 2 algorithms)

This paper contains 17 sections, 8 equations, 11 figures, 3 tables, 2 algorithms.

Introduction
Preliminaries
Federated learning (FL)
Gaussian mixture models
Variational autoencoder (VAE)
Related Work
Proposed misbehavior detection system
Dataset and preprocessing
System description
Overview
Model training
Local misbehavior detection
Evaluation
Client division
Particular case analysis: Clients with 298 components
...and 2 more sections

Figures (11)

Figure 1: Pictorial description of an FL setting
Figure 2: AE's pictorial description. It consists of a symmetric fully connected NN where the input is compressed through the encoder to the latent space to be reconstructed by the decoder.
Figure 3: Pictorial description of a VAE. Similar to the AE, but the latent space is changed by three layers to ensure the encoder follows a normal distribution.
Figure 4: Pictorial description of our misbehavior detection system. There are three phases, the first one (1), is the initialization, where the clients train the GMM and create the histograms subsequently. Also, it uses RBM to create VAE's initial weights. Then, in (2), clients create a federated environment to train their VAE models. And finally, in (3), the local misbehaviour detection is carried out, where each client classifies the samples as benign or malign using the model trained and the threshold $th$ set.
Figure 5: Number of clients per cluster depending on the number of components
...and 6 more figures

Federated Learning for Misbehaviour Detection with Variational Autoencoders and Gaussian Mixture Models

TL;DR

Abstract

Federated Learning for Misbehaviour Detection with Variational Autoencoders and Gaussian Mixture Models

Authors

TL;DR

Abstract

Table of Contents

Figures (11)