Table of Contents
Fetching ...

FLARE: Adaptive Multi-Dimensional Reputation for Robust Client Reliability in Federated Learning

Abolfazl Younesi, Leon Kiss, Zahra Najafabadi Samani, Juan Aznar Poveda, Thomas Fahringer

TL;DR

FLARE addresses the fragility of static, binary trust mechanisms in federated learning by introducing a dynamic, multi-dimensional reputation framework that continuously evaluates client reliability across performance, statistical, and temporal dimensions. It employs an adaptive threshold to adjust security rigor to the model's convergence state and recent attack intensity, and uses reputation-weighted aggregation with soft exclusion to balance robustness and participation, all while preserving privacy via Local Differential Privacy. A Statistical Mimicry attack benchmark (SM) tests the framework's resilience, and extensive experiments on MNIST, CIFAR-10, and SVHN with 100 clients demonstrate that FLARE maintains higher accuracy and faster convergence than state-of-the-art defenses under a range of attacks, including adaptive and evasive strategies. The results indicate that FLARE achieves strong malicious-client detection with low overhead and remains effective across varying data heterogeneity and attack intensities, making it practical for real-world deployments.

Abstract

Federated learning (FL) enables collaborative model training while preserving data privacy. However, it remains vulnerable to malicious clients who compromise model integrity through Byzantine attacks, data poisoning, or adaptive adversarial behaviors. Existing defense mechanisms rely on static thresholds and binary classification, failing to adapt to evolving client behaviors in real-world deployments. We propose FLARE, an adaptive reputation-based framework that transforms client reliability assessment from binary decisions to a continuous, multi-dimensional trust evaluation. FLARE integrates: (i) a multi-dimensional reputation score capturing performance consistency, statistical anomaly indicators, and temporal behavior, (ii) a self-calibrating adaptive threshold mechanism that adjusts security strictness based on model convergence and recent attack intensity, (iii) reputation-weighted aggregation with soft exclusion to proportionally limit suspicious contributions rather than eliminating clients outright, and (iv) a Local Differential Privacy (LDP) mechanism enabling reputation scoring on privatized client updates. We further introduce a highly evasive Statistical Mimicry (SM) attack, a benchmark adversary that blends honest gradients with synthetic perturbations and persistent drift to remain undetected by traditional filters. Extensive experiments with 100 clients on MNIST, CIFAR-10, and SVHN demonstrate that FLARE maintains high model accuracy and converges faster than state-of-the-art Byzantine-robust methods under diverse attack types, including label flipping, gradient scaling, adaptive attacks, ALIE, and SM. FLARE improves robustness by up to 16% and preserves model convergence within 30% of the non-attacked baseline, while achieving strong malicious-client detection performance with minimal computational overhead. https://github.com/Anonymous0-0paper/FLARE

FLARE: Adaptive Multi-Dimensional Reputation for Robust Client Reliability in Federated Learning

TL;DR

FLARE addresses the fragility of static, binary trust mechanisms in federated learning by introducing a dynamic, multi-dimensional reputation framework that continuously evaluates client reliability across performance, statistical, and temporal dimensions. It employs an adaptive threshold to adjust security rigor to the model's convergence state and recent attack intensity, and uses reputation-weighted aggregation with soft exclusion to balance robustness and participation, all while preserving privacy via Local Differential Privacy. A Statistical Mimicry attack benchmark (SM) tests the framework's resilience, and extensive experiments on MNIST, CIFAR-10, and SVHN with 100 clients demonstrate that FLARE maintains higher accuracy and faster convergence than state-of-the-art defenses under a range of attacks, including adaptive and evasive strategies. The results indicate that FLARE achieves strong malicious-client detection with low overhead and remains effective across varying data heterogeneity and attack intensities, making it practical for real-world deployments.

Abstract

Federated learning (FL) enables collaborative model training while preserving data privacy. However, it remains vulnerable to malicious clients who compromise model integrity through Byzantine attacks, data poisoning, or adaptive adversarial behaviors. Existing defense mechanisms rely on static thresholds and binary classification, failing to adapt to evolving client behaviors in real-world deployments. We propose FLARE, an adaptive reputation-based framework that transforms client reliability assessment from binary decisions to a continuous, multi-dimensional trust evaluation. FLARE integrates: (i) a multi-dimensional reputation score capturing performance consistency, statistical anomaly indicators, and temporal behavior, (ii) a self-calibrating adaptive threshold mechanism that adjusts security strictness based on model convergence and recent attack intensity, (iii) reputation-weighted aggregation with soft exclusion to proportionally limit suspicious contributions rather than eliminating clients outright, and (iv) a Local Differential Privacy (LDP) mechanism enabling reputation scoring on privatized client updates. We further introduce a highly evasive Statistical Mimicry (SM) attack, a benchmark adversary that blends honest gradients with synthetic perturbations and persistent drift to remain undetected by traditional filters. Extensive experiments with 100 clients on MNIST, CIFAR-10, and SVHN demonstrate that FLARE maintains high model accuracy and converges faster than state-of-the-art Byzantine-robust methods under diverse attack types, including label flipping, gradient scaling, adaptive attacks, ALIE, and SM. FLARE improves robustness by up to 16% and preserves model convergence within 30% of the non-attacked baseline, while achieving strong malicious-client detection performance with minimal computational overhead. https://github.com/Anonymous0-0paper/FLARE

Paper Structure

This paper contains 17 sections, 11 equations, 8 figures, 10 tables, 2 algorithms.

Figures (8)

  • Figure 1: Comparison of client reliability assessment approaches in federated learning. Left: Previous static methods make binary inclusion/exclusion decisions that remain fixed throughout training, leading to false positives for honest clients with temporary issues or unique data distributions, while failing to detect adaptive attackers. Right: Our FLARE framework uses dynamic reputation scoring with continuous weight adjustments, enabling nuanced trust assessments that adapt to evolving client behavior over time.
  • Figure 2: Our proposed 5-step framework for reputation-aware aggregation: (1) Compute per-client performance scores, (2) Dynamically adjust mixing coefficients $w_j^t$ based on convergence progress and detected attack patterns, (3) Compute a weighted reputation score for each client using $w_j^t$, (4) Classify clients into trusted (fully included), suspicious (partially included), or untrusted (excluded), (5) Perform aggregation via weighted FedAvg using reputation-based client inclusion.
  • Figure 3: Reputation dynamics across training rounds for representative clients. The curves illustrate per-round scores (performance consistency $r_1$, statistical anomaly $r_2$, temporal behavior $r_3$), the combined reputation $R_t$, the adaptive threshold $\tau_t$, and the resulting soft-exclusion weight $w_t$. Benign clients maintain high $R_t$ and stable $w_t$, while noisy-but-benign clients experience dips and recover. In contrast, malicious and adaptive clients exhibit sharp drops, followed by a decay in reputation, which prevents rapid trust recovery. This explains how evidence over time is converted into aggregation weights and admission decisions.
  • Figure 4: Client Role Distribution for 100 Clients in a scenario where all 6 attack types might occur. We expect to have around 80 benign clients (pink box) and 20 malicious clients (green box), where each malicious client is assigned with one of 6 attack behaviors, meaning we expect ($\thickapprox 3$) malicious clients for each attack pattern (orange box)
  • Figure 5: Comparison of detection performance (Precision, Recall, and F1-Score) of eight FL methods across six attack scenarios on MNIST: Adaptive, Byzantine Gradient, Gradient Scaling, Label Flip, ALIE, and SM Attack. Each subplot reports the average detection quality along with confidence intervals. Higher values indicate better detection effectiveness.
  • ...and 3 more figures