Table of Contents
Fetching ...

On the Fragility of Contribution Score Computation in Federated Learning

Balazs Pejo, Marcell Frank, Krisztian Varga, Peter Veliczky, Gergely Biczok

TL;DR

This work investigates the fragility of contribution evaluation (CE) in federated learning, showing that CE scores are highly sensitive to the chosen aggregation method and can be manipulated by adversarial updates. It introduces two attacks, Self Improvement and Targeted Decrease, and evaluates them against multiple CE schemes (SV, GTG, LOO, ADP) within the Flower framework across IID and non-IID settings. The study reveals that architectural choices can distort CE distributions and that principled marginal-difference CE schemes are vulnerable to manipulation, raising concerns about fairness and incentive compatibility in FL collaborations. The findings advocate for more robust, attack-aware CE frameworks and broader evaluations to ensure fair participation while maintaining model quality in real-world FL deployments.

Abstract

This paper investigates the fragility of contribution evaluation in federated learning, a critical mechanism for ensuring fairness and incentivizing participation. We argue that contribution scores are susceptible to significant distortions from two fundamental perspectives: architectural sensitivity and intentional manipulation. First, we explore how different model aggregation methods impact these scores. While most research assumes a basic averaging approach, we demonstrate that advanced techniques, including those designed to handle unreliable or diverse clients, can unintentionally yet significantly alter the final scores. Second, we explore vulnerabilities posed by poisoning attacks, where malicious participants strategically manipulate their model updates to inflate their own contribution scores or reduce the importance of other participants. Through extensive experiments across diverse datasets and model architectures, implemented within the Flower framework, we rigorously show that both the choice of aggregation method and the presence of attackers are potent vectors for distorting contribution scores, highlighting a critical need for more robust evaluation schemes.

On the Fragility of Contribution Score Computation in Federated Learning

TL;DR

This work investigates the fragility of contribution evaluation (CE) in federated learning, showing that CE scores are highly sensitive to the chosen aggregation method and can be manipulated by adversarial updates. It introduces two attacks, Self Improvement and Targeted Decrease, and evaluates them against multiple CE schemes (SV, GTG, LOO, ADP) within the Flower framework across IID and non-IID settings. The study reveals that architectural choices can distort CE distributions and that principled marginal-difference CE schemes are vulnerable to manipulation, raising concerns about fairness and incentive compatibility in FL collaborations. The findings advocate for more robust, attack-aware CE frameworks and broader evaluations to ensure fair participation while maintaining model quality in real-world FL deployments.

Abstract

This paper investigates the fragility of contribution evaluation in federated learning, a critical mechanism for ensuring fairness and incentivizing participation. We argue that contribution scores are susceptible to significant distortions from two fundamental perspectives: architectural sensitivity and intentional manipulation. First, we explore how different model aggregation methods impact these scores. While most research assumes a basic averaging approach, we demonstrate that advanced techniques, including those designed to handle unreliable or diverse clients, can unintentionally yet significantly alter the final scores. Second, we explore vulnerabilities posed by poisoning attacks, where malicious participants strategically manipulate their model updates to inflate their own contribution scores or reduce the importance of other participants. Through extensive experiments across diverse datasets and model architectures, implemented within the Flower framework, we rigorously show that both the choice of aggregation method and the presence of attackers are potent vectors for distorting contribution scores, highlighting a critical need for more robust evaluation schemes.

Paper Structure

This paper contains 43 sections, 10 equations, 6 figures, 6 tables, 1 algorithm.

Figures (6)

  • Figure 1: Our FL scenarios: Architectural Sensitivity (marked with a 5-star) uses all aggregation techniques and GTG and ADP as Contribution Evaluation, while Intentional Manipulation (marked with a 6-star) relies on FedAvg and considers GTG and LOO.
  • Figure 2: Clients' LOO and GTG scores in the third training round when ADULT is split into 5 clients (IID or non-IID) where the 1st client either self improving its score or not. Blue is the baseline no attack; red is attack.
  • Figure 3: Clients' contribution score differences (LOO (blue) or GTG (red)) in the third training round when ADULT is split into 5 clients (IID or non-IID) when the 1st client is self improving its score.
  • Figure 4: Clients' contribution scores (LOO or GTG) in the third training round when the dataset (ADULT or FMNIST) is split into 5 clients (IID or non-IID) where the 1st client either decreases the 2nd client's score or not. Blue is the baseline no attack; red is attack.
  • Figure 5: Clients' contribution score differences (LOO (blue) or GTG (red)) in the third training round when the dataset (ADULT or FMNIST) is split into 5 clients (IID or non-IID) when the 1st client is decreasing the 2nd's score.
  • ...and 1 more figures